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A SECOND STUDY OF CHARACTERISTICS OF 
GOOD AND POOR SPELLERS 


DAVID H. RUSSELL 


University of California, Berkeley 


Stephen Leacock, the Canadian humorist, once said that ‘‘People 
look on spelling as one of the troubles of childhood, like measles and 
Sunday School and having to obey Father.” Since spelling is still 
a ‘trouble’ to children, the causes of this phenomenon are of 
interest. 

The first study of characteristics of good and poor spellers by 
the present writer summarized pertinent research in spelling under 
the last two of four heads: (a) What words shall be taught? (b) 
At what grade levels shall these be presented? (c) What are the 
best methods of teaching and learning to spell? and (d) What 
characteristics of the individual child affect spelling ability? In 
the fifteen years that have elapsed since the first report (36), some 
advances have been made in the investigation of each of these 
four areas but, with the possible exception of the first, the progress 
has not been notable. The present account reviews briefly some of 
the problems studied during this time and presents further evi- 
dence in the fourth category concerned with the individual learner, 


WORD LISTS 


The pioneer work of Thorndike, Horn and others in obtaining 
measures of frequency of use as aids to creating a spelling list has 
been continued in the last fifteen years. The Thorndike-Lorge list 
(46) has been extended to include thirty thousand words. Rinsland 
(35) obtained the most complete record of words used in children’s 
writing so far available. Dolch (11) derived the two thousand 
‘commonest words for spelling’ from various vocabulary studies. 
Fitzgerald’s (13) list of 2,650 words included a shorter list of four 
hundred and forty-nine which he believes contains about three- 
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fourths of the words children ordinarily write. Lorge’s (32) semantic 
count giving the frequency of meaning of five hundred and seventy 
multi-meaning words is a valuable resource that has not yet been 
fully utilized in the preparation of spelling texts and curriculum 
guides. 

Somewhat more specialized lists have been presented by differ- 
ent research workers. Not all of these are concerned with children’s 
written language but they have implications for spelling lists or 
texts. Kyte (30, 31) compiled two lists of on‘e hundred and of five 
hundred and one useful words derived from other lists and judged 
highly useful in language arts activities, especially spelling. Stone 
(44) revised his vocabulary list in reading for primary grades. Dale 
(10) compiled a list of seventy-five studies of various types of 
vocabulary in use and added thirty-seven references on vocabulary 
grade placement. Stauffer (42) listed the prefixes found in Thorn- 
dike’s T’eacher’s Word Book of 20,000 Words and found that fifteen 
prefixes accounted for eighty-two per cent of the total number of 
prefixes. Thorndike (45) also presented the commonest suffixes in 
English words of value in spelling instruction as well as in reading 
instruction. 

The careful word counts of the above studies were adapted to 
school use with the discovery, first by Ayres, and in more detail by 
Horn (26) and others that three thousand to four thousand of 
these commonest words made up from ninety-five to ninety-eight 
per cent of all words children ordinarily write. For some time, this 
finding seemed to create a clear directive for curriculum-makers 
and authors of spelling texts: Simply teach from three to four 
thousand of the commonest words and you will have taught chil- 
dren most of the words they need to know. Unfortunately, perhaps, 
this solution did not work out so easily in practice for it did not 
consider the many special words children want to write which do 
not appear on master-lists; it did not say when the basic words 
were to be taught; and it did not take account of the abilities and 
disabilities which help or hinder individual children in learning to 
spell. We now know beyond reasonable doubt the commonest 
words in English, but all the problems of the spelling program are 
not thereby settled. 


GRADE PLACEMENT OF WORDS 


The placement of words at different grade levels has usually been 
determined in a school’s course of study or a series of spelling texts 
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on a multiple basis. Some of the usual criteria are: (1) Words easy 
to spell are placed in the lower grades, (2) The most frequently 
used words appear early in the grades, (3) Spelling words should 
be related to words used in related language arts, social studies and 
other curricular activities, (4) Common words used by adults more 
than children should be placed in the upper grades, (5) Words 
should be taught, if possible, in the first grade in which children 
write them. 

Perhaps the multiple criteria used by different authors have been 
responsible for the findings in two studies by Betts (5, 6) that 
authors of spelling texts do not agree on the grade placement of 
words. In the first investigation involving 8,645 words in eighteen 
series of texts, Betts found that only one word was placed at the 
same grade level by all authors. In the second study, there was 
agreement in the grade placement of sixty-five of the four hundred 
and eighty-three words common to eight series of spellers. Fifty- 
five of these words were placed in the second grade. The disagree- 
ment found by Betts is not as great as it seems since many words 
were placed at adjacent grade levels. Indeed, the findings may be 
interpreted as desirable in the sense that different school systems 
or schools should place words earlier or later than others, depending 
on the children’s language backgrounds and other factors such as 
use. 
Even the small amount of agreement found by Betts is com- 
plicated further by the relationships between word frequency and 
word difficulty and by the fact that the most adequate grade 
placement leaves out of the spelling list many words that children 
of a certain grade may need to write. Research results are not 
clear-cut in regard to the first relationship. For example, as early 
as 1918 Hollingworth (25) showed that word meaning was a factor 
in spelling difficulty, but Wesman and Seashore (48) have found 
that frequency and ease of meaning are not identical. Several 
writers have shown that the factors other than frequency of use 
and meaning which made a word difficult to spell include such 
characteristics as length of word, vowel difficulties and unusual 
combinations of letters. Gates (17) has indicated the difficult spots 
in a list of 3,876 common words and Spache (40) has summarized 
the research on types of spelling errors. Studies such as these help 
define the first criterion given above as to which words are easy 
to spell and which more difficult and may, therefore, be of use in 
constructing word lists for different grade levels. 
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In regard to the second complication, in a little study which 
needs further verification at other levels, Wilson (50) found that 
only fourteen per cent of the different words written by fifth- 
graders in letters, compositions, original stories and other school 
writing were to be found in their state spelling text. Curtis and 
Dolch (9) reported that most of the learning of spelling in one 
Illinois school system was done either before the year in which 
the words were taught from a text or after the year in which they 
were taught. Hildreth (24) has criticized the lack of usefulness of 
most spelling texts and prepared lists. She said: ‘‘Spellers in com- 
mon use today in Grade II through Grade VIII are heavily loaded 
with spelling-contest words typical of our grandparents’ day— 
words that must be studied by children some years ahead of the 
infrequent times they will ever use them.’ (p. 261) Horn (28), 
however, has sensibly stated that the issue is not an ‘either-or’ 
one of spelling list versus individual study but of some combination 
of the two approaches. It seems reasonable to deduce that the usual 
spelling list in a course of study or spelling text does not cover 
nearly all the child’s spelling needs and should be supplemented 
by locally prepared lists and planned instruction in spelling related 
to other curricular activities and to writing needs. 


METHODS OF TEACHING AND LEARNING TO SPELL 


The writing and research on methods of spelling instruction has 
broken little new ground during the last fifteen years. Two books, 
Dolch’s Better Spelling (11) and Fitzgerald’s Teaching of Spelling 
(14) contained many specific suggestions for instruction, some of 
which have well-established research backing. Wilson (49) traced 
the historical development of spelling instruction through three 
main periods from 1647 to the present, and noted current trends 
toward individualized teaching and toward achieving a closer rela- 
tionship between spelling and other curricular activities. Artley 
(1) reviewed some thirty references in deriving five principles of 
classroom instruction in spelling. These included provisions for 
individual differences, need for expression as motivation for learn- 
ing to spell, the necessity of direct instruction in spelling, the need 
of the child to become independent in spelling ability, and the 
desirability of favorable attitudes toward correct spelling. 

One fresher approach to spelling instruction has developed in an 
emphasis upon the relationship of spelling to the other language 
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arts. Russell (38), Sparrow (41) and Townsend (47) have all found 
rather close relationships between spelling ability and a group of 
other language abilities. Artley (2) and Hildreth (23) have reviewed 
the research on interrelationships of the various language arts. 
Betts (5), Gates (18) and Yoakam (52) have warned that although 
spelling and reading are closely related in many ways they are also 
very different in some characteristics and purposes and may actu- 
ally interfere with one another. 

Another new approach to the problem of better spelling instruc- 
tion has developed within the area of analysis of spelling ability 
and disability. One attempt to study the composition of spelling 
ability by factor analysis was hampered by too narrow a definition 
of spelling and related abilities (29). A more comprehensive view 
of spelling ability was developed by Nichols (34) in a series of tests 
which included spelling achievement, proof-reading, word meaning, 
handwriting, visual discrimination and auditory discrimination. A 
number of studies have also continued the attempt to improve 
spelling instruction through the analysis of errors made by children. 
After reviewing some thirty studies, Spache (40) systematized a 
method of recording spelling errors and gave the expected frequency 
of the different types of errors. Wolff (51) compared errors made 
in written assignments, in regular weekly spelling tests, and on a 
standardized test in one fifth-grade class. Mechanical errors appear 
most frequently in essay-type materials due to difficulties in hand- 
writing and rules of English usage. Phonetic errors centered on 
certain hard spots in words; non-phonetic errors were associated 
with lack of knowledge of the word. 

Further analytical data on how children learn to spell were con- 
tributed by the Gilberts (20) in studies of eye movements. As in 
an earlier study (19), they found wide individual differences in the 
way children perceived words when learning to spell them. Com- 
binations of letters which were difficult for some children were 
easily mastered by others. Poor spellers took longer and looked at 
words irregularly; good spellers spent more time on the hard spots 
in words. Limiting the time of study and helping a pupil determine 
his own needs were effective techniques. The Gilberts’ results sug- 
gested that increasing amounts of the same study procedures will 
not improve spelling achievement, but that early diagnosis and 
individualized help will produce better habits of studying words. 

Stegeman’s (43) study of the importance of forgetting in spelling 
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also does not argue for more instruction of the same kind but for 
properly spaced review. 

The fact that curricula are based not only on social demands 
and on the characteristics of children, but also on the nature of 
the materials to be learned has been exemplified once again in 
Hanna and Moore’s (22) suggestions for emphasizing a neglected 
phase of word study. They explored the possibilities of using the 
phonetic elements of our language in building a spelling program. 
More specifically, they recorded the consistency with which a 
certain speech sound is represented by one letter or combination 
of letters. They found that about four-fifths of the phonemes 
(speech sounds) in an elementary spelling vocabulary are repre- 
sented by a regular spelling but that vowel phonemes are much 
less consistent in their spelling than initial consonant phonemes. 
They suggested that the spelling program be planned to present 
words close together in which the phonemes have a high degree of 
spelling consistency, thus encouraging the children to generalize 
about their spelling. They believed: “Our first job in teaching chil- 
dren to spell is to make certain that they can hear the sounds of 
the words they are to spell.’”’ (p. 336) Somewhat in opposition to 
these suggestions, Horn (27) found earlier that presenting spelling 
words in syllabified form offered no advantages. 

As suggested above, in planning a spelling program factors of 
usage and need must be considered in addition to such auditory 
perception of word parts. However, further analysis of auditory 
abilities was one phase of the research reported in the next sections. 


OTHER CHARACTERISTICS OF GOOD AND POOR SPELLERS 


The first study of the characteristics of good and poor spellers 
by the present writer (36) compared two groups of good and poor 
spellers, matched for sex, mental ability and chronological age, but 
differing in spelling ability, on eighteen tests of different abilities. 
Pupils were given audiometer tests and the Betts perception series 
using the Keystone telebinocular. In general, the results indicated 
that the good spellers reliably exceeded the poor spellers on such 
factors as word pronunciation, reading comprehension and speed, 
reading accuracy and a systematic attack on new words to be 
learned. Poor spellers reliably exceeded good spellers in mispronun- 
ciations, and use of a letter-by-letter method in studying words. 

The present study went beyond the first investigation in com- 
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paring good and poor spellers on auditory discrimination abilities, 
more specific visual perception, vocabulary and certain mental abil- 
ities included by the Thurstones in their SRA Primary Mental 
Abilities test. 

During the last fifteen years knowledge of children’s auditory 
abilities has been extended by a series of studies by Durrell and 
his students (12). Many of these have dealt with such abilities in 
relation to beginning reading, but Russell (37) has shown that 
there may be considerable overlap in verbal abilities basic to both 
reading and spelling in the primary grades. A recent study by 
Caffrey (8) of what he calls ‘auding’ has added considerably to 
what is known about auditory or listening skills. Caffrey experi- 
mented with tests of seven types of listening skills which included 
tests of vocabulary and several kinds of comprehension such as 
main idea and following directions. He also devised a test of audi- 
tory discrimination used in the present study. This test measured 
auditory abilities more specifically than those used in earlier in- 
vestigations by Bond (7) and by Russell (36). It included three 
subtests: (1) distinguishing between pairs as the same or different 
such as making making and shown sown, (2) recognizing similar or 
different vowel sounds in words, as in wine fight and move love, (3) 
telling whether words were different in their beginning, middle or 
final sounds, as in butter buzzer and pillow billow. 

The present investigation also supplemented the earlier one in 
extending measures of visual discrimination and vocabulary. Each 
child was given an experimental form of a visual perception test 
developed by the writer which, in a limited time, required him to 
mark words as the same or different. Sample items were: 


here S D herd 
portion § D portion 


In addition, each child was given a series of seven vocabulary tests 
constructed by the writer in varied forms and measuring knowl- 
edge of words in different areas such as social studies, science, 
mathematics, sports and hobbies (39). Sample items were: 
a. The word copyright usually appears in connection with 
(a) kitchen gadgets (b) silver coins (c) machines (d) books 
(e) income tax forms. 
b. An ounce is commonly used to measure (a) potatoes 
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TABLE 1.—DIFFERENCES BETWEEN Goop SPELLERS (UpPerR 27 Per Cent) 
AND Poor SPELLERS (LOWER 27 Per CENT) OF APPROXIMATELY 250 
CHILDREN IN THE FIFTH AND SixtH GRADES 











Measure — yaad — ; «D t Tx. sp 
Auditory 43.82 | 27.76 | 16.06 | 1.96 | 8.19 | .54 
Visual 36.88 | 27.09 | 9.79 | .91 | 10.76) .50 
A+ V 80.70 | 54.85 | 25.85 | 2.02 | 12.80 | .66 
California Reading 
Comprehension 71.20 | 41.58 | 29.62 | 6.17 | 4.80] .45 
Vocabulary 72.06 | 23.47 | 48.59 | 4.48 | 10.84 | .60 
Experimental Vocabulary 
Tests 
Math 41.24 | 29.79 | 11.45 | 3.72} 3.08] .42 
Misc 1 74.66 | 35.62 | 39.04 | 4.75 | 8.22] .31 
Misc 2 41.10 | 25.76 | 15.34 | 3.04 5.05)| .48 
Misc T 115.00 | 62.88 | 52.12 | 6.91 | 7.54| .66 
Social Studies 1 30.57 | 18.05 | 12.52 | 2.54 | 4.93 | .49 
Social Studies 2 83.76 | 62.67 | 21.09 | 5.06 | 4.17 | .42 
Primary Mental Abilities 
Perception 62.11 | 37.05 | 25.06 | 4.86 | 5.16/| .29 
Reasoning 63.83 | 30.89 | 32.94 | 6.03} 5.46) .51 
Space 66.17 | 63.26 | 2.91 | 4.68 .62 | .14 
Total 60.81 | 27. 33.75 | 5.96 | 5.66) .58 




















t = Fisher’s ¢ = difference between means divided by the standard error 
of that difference, sample size being taken into account in the computation 


equation. 
Ix.sp = Pearson r = correlation between spelling grade and the measure 


shown. 


(b) baby food (c) fish (d) lumber (e) electricity (followed by 
two other harder items on the same word). 

Other factors on which the groups were compared were the 
California Reading Test and the tests of the SRA Primary Mental 
Abilities Test. Table 1 gives a comparison of scores of approxi- 
mately two hundred and fifty children in Grades V and VI who 
were in the top and bottom twenty-seven per cent of the distribu- 
tion of scores on the spelling test of the Progressive Achievement 
Test. The results indicate that the group of good spellers exceed 
the group of poor spellers in the same grades on fourteen out of 
fifteen of the measures used, the ¢ score indicating that the differ- 
ences are significant at the .01 level. Only on the Space test of the 
Primary Mental Abilities Test is there no significant difference 
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between the good and poor spellers. (This is a test of recognition 
of letter-like and geometric figures which have been rotated or 
otherwise shifted in position.) The results indicate that good 
spellers at the fifth- and sixth-grade levels tend to have superior 
auditory and visual perception, that they score higher in reading 
comprehension and a wide variety of vocabulary tests, and that 
they are superior in perception, reasoning and the total scores of 
the Primary Mental Abilities Test. Although this superior spelling 
ability is associated with superior mental ability as measured by 
the Thurstone test, a number of studies have shown that the cor- 
relation between spelling ability and general mental ability is not 
high, usually ranging between .20 and .55 (36). Even the relatively 
high correlation obtained in the present study (.58) suggests that 
factors of perception and word meaning as well as general mental 
ability contribute to spelling ability. 

The relationships between the different variables and spelling 
ability are also indicated in the last column of Table 1. It may be 
noted here that the highest correlations with spelling scores are 
those of the combined auditory-visual test and a total miscellane- 
ous vocabulary score. 

Further analysis of the relation of the auditory and visual per- 
ception abilities to spelling was of interest, so scattergrams showing 
the relationships were constructed. The eta test of curvilinearity 
of regression (33) was applied and eta was found to differ signifi- 
cantly from the Pearson r in predicting spelling from auditory plus 
visual (A + V) score or auditory and visual scores alone. The 
results suggested that the Pearson r’s of Table 1 somewhat under- 
estimated the relationships and also that there is a somewhat lower 
correlation between spelling ability and auditory discrimination 
than there is between spelling ability and visual discrimination. 
Further, the results suggested that the relationship between spelling 
score and A + V score decreases as spelling score increases. Since 
the mean spelling score of the lowest twenty-seven per cent was 
3.9 (in terms of grade) and of the highest twenty-seven per cent 
7.8, it may be stated that spelling ability was more closely related 
to auditory and visual abilities around the third- and fourth-grade 
levels of ability than around the seventh- and eighth-grade levels 
of ability. Put another way, poor spelling ability in the fifth and 
sixth grades is closely related to poor discrimination of auditory 
and visual differences but high spelling ability at these levels is 
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not closely related to superior discrimination. It may be that for 
children spelling at the average seventh- or eighth-grade level of 
achievement a number of factors other than auditory and visual 
discrimination abilities affect spelling ability. 

The Pearson r between the auditory and visual tests was only 
.30, but the eta was .45. The regression lines for both variables 
were curvilinear but, in this case, predicting auditory from visual 
scores, or vice versa, was safer at the higher limits for both scores. 
That is, if a pupil scored high on the visual test he was likely to 
score high on the auditory test but if he scored low on either, pre- 
diction was less certain. The reliabilities of these experimental tests 
according to the Kuder Richardson formula 21 (16) were as follows: 


Auditory .93 Auditory + Visual .93 
Visual 88 
CONCLUSION 


During the last fifteen years considerable progress has been made 
in determining the words most useful to children and in discovering 
why some words are learned easily and others are harder to spell. 
Studies of eye-movements have indicated in more detail how chil- 
dren look at words in learning to spell them. Relationships between 
spelling and other language arts have been explored but other 
studies of methods of learning and instruction have been meager. 

The present report added to a previous study of characteristics 
of good and poor spellers by further investigation of visual and 
auditory abilities in relation to spelling ability. It found that the 
upper twenty-seven per cent of approximately two hundred and 
fifty children in the fifth and sixth grades in terms of their spelling 
achievement significantly exceeded the lower twenty-seven per 
cent of the group on all fifteen tests used except the Space test of 
the SRA Primary Mental Abilities test. It discovered correlations 
between spelling ability and most of the other abilities tested, 
except Perception and Space on the Thurstone Test, which ranged 
in the .40’s to .60’s. It also found that visual and auditory dis- 
crimination abilities seem to be closely related to spelling ability 
around the third- and fourth-grade levels but not so closely related 
at the seventh- and eighth-grade levels of spelling ability. 
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THE ORGANISMIC AGE CONCEPT 
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The Organismic Age concept has had considerable appeal to 
elementary teachers. Perhaps one reason for this is the fact that 
its proponents have put it forward in a setting of enlightened school 
practices. In this paper we are not concerned with these practices 
but with the validity of the OA concept. 

Long since accepted is the idea that a child’s readiness for read- 
ing, arithmetic, third-grade science, and so on is a function of his 
mental development, total mental development. In this, the MA 
is the most important single component. But social development, 
emotional maturity, before-school experiences, language develop- 
ment, motor coérdination, and perhaps other factors are also im- 
portant. The OA is an average of age scores of mental data and 
age scores of anatomical and physiological data, which hereafter 
we shall call physical data. Commonly, OA is the average of mental 
age, reading age, dental age, grip age, metacarpal age, vital capac- 
ity age, height age, and weight age, although there is no hard and 
fast set of component measures. In a sense it combines measures 
of what is commonly called mental growth and measures of what 
we may call physical growth; and, because of the disproportionate 
number of measures of the latter kind, the average, OA, is weighted 
heavily in that direction.‘It is with the feasibility in school prac- 
tice of making prediction of achievement, determining readiness, 
appraising achievement of pupils in terms of their anatomical and 
physiological development, as well as in terms of measures of men- 
tal development that we are concerned. Can we make better pre- 
dictions or interpretations (which amounts to the same thing) by 
so doing? 

Obviously the first questions one would ask in this connection 
are: Are measures of physical growth related (1) to mental growth, 
and (2) to academic achievement? Moreover, if found to be related, 
are they related to an extent that makes any practical difference? 

They are, of course, related in a spurious way. All the measures 
that go into OA are related one to another in this way. Any growth 
process requires time. Childhood is the period of growth. Most of 
the growth functions begin in early childhood, even before birth, 
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and continue until maturity is reached. Naturally there is a certain 
relatedness or togetherness in the various growth functions. Chil- 
dren who wear Number 5 shoes are taller, heavier, have bigger 
bones, stronger grip, and more teeth than children who wear Num- 
ber 2 shoes. They also can read better, know more about world 
geography, and have higher MA’s. Educational development and 
mental development take time also. This is the meaning of growth. 
All the time the child is getting older. He does not grow because 
he gets older; but he must get older before he can grow to a larger 
size. Childhood is not only the time for physical growth; it is also 
the time for mental growth, and growth in knowledge of school sub- 
jects. If, in longitudinal studies, we plot curves showing growth, 
or yearly increments, over a number of years, we would necessarily 
find a relatedness in all the functions measured because elapsed 
time is common to all the measures. : 

But children do not all grow at the same rate. All do not reach 
the same point at age seven, for example. At age seven, some have 
MA’s of 8, 9, and 10; others have MA’s of only 6, 5, or 4. Some 
are as tall as the average eight-, nine-, or ten-year-old; others are 
no taller than the average six-, five-, or four-year-old. They like- 
wise vary in all other aspects of physical growth and in academic 
achievement. Everywhere we know that academic achievement is 
related to mental growth. If we pick out the brightest fourth and 
dullest fourth of our seven-year-old group, we will find that we 
also have the highest and lowest fourths in achievement, and that 
the differences in achievement will be large. This is what we mean 
by saying that mental development and educational development 
are related one to the other. We can predict this in advance for 
any normal group of seven-year-old pupils. This, again, is what we 
mean by related. 

If physical growth is to be of any help, those same seven-year- 
old pupils who are advanced in educational development and men- 
tal development must turn out to be the tallest, heaviest, have the 
most teeth, the best-developed bones, the strongest grip, and so on. 
Those who are retarded in education and mental development must 
show underdevelopment in these physical traits and in significant 
amounts. If this does happen, we can say physical development is 
related to educational and mental development, and we can demon- 
strate the presence and amount of relationship by statistics. If we 
cannot demonstrate the relationship statistically, we have no right 
to say they are related. 
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Research on this important problem dates back to around 1900. 
All the measures of physical growth that have been here mentioned 
have been correlated with measures of mental and educational 
growth—most of them several times—by various investigators. 
These investigations have most generally yielded correlation co- 
efficients at or near zero, when CA is held constant. By contrast, 
investigators everywhere have found substantial and significant 
relationships between measures of mental growth and educational 
development. Both of these generalizations are so well known and 
commonly agreed upon, it seems unnecessary to treat this subject 
historically or to cite extensively the experimental investigations 
(1). Height age and weight age, and they alone of the physical meas- 
ures, have been found to correlate significantly with MA, CA held 
constant. The average of the obtained coefficients is something on 
the order of .20. Predictions made on correlation coefficients of 
this magnitude are not more than two per cent better than chance. 

Dearborn and Rothney report coefficients between intelligence 
test scores and various measures of physical growth for five hun- 
dred and thirty-three sixteen-year-old boys as follows (2): 


Standing 
Height Weight Iliac Chest Depth Chest Width 
Intelligence 224 .137 .078 .060 .138 


The multiple correlation between intelligence test scores and the 
five physical measures was found to be R = .247. 

The writers have access to some data collected for another pur- 
pose by Choitz (3). These include Otis MA’s, reading scores, arith- 
metic scores, height age scores, weight age scores, and dental age 
scores, among others, for a sample of pupils in grades 4, 5, and 6— 
about forty pupils in each grade. By a within-grade design, which 
tends to hold chronological age more or less constant, intercorre- 
lations and multiple correlations were computed. Multiple corre- 
lations were obtained as follows: 


MA, alone MA, HA MA, HA,WA MA, HA, WA, DA 
Reading .6451 .6508 .6543 .6592 
Arithmetic .5507 .5520 .5559 .5560 


The total effect of the addition of the three physical measures, 
HA, WA, and DA raised the correlation coefficient between reading 
achievement and MA and arithmetic achievement and MA by 
approximately .01 (4). It is likely that the addition of GA, CA, 
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and VcA would have raised the multiple correlation by perhaps 
a like amount. A change of this amount in a coefficient has neither 
statistical nor practical significance. These findings are in agree- 
ment with those obtained by Gates many years ago (5). He ob- 
tained a correlation of .60 between MA and educational achieve- 
ment scores. A multiple correlation procedure involving, in addition 
to MA, bone ossification ratio, height, weight, chest girth, lung 
capacity, grip, and nutritional status, yielded a coefficient of .63, 
an increase of .03. 

Proponents of OA sometimes write in such a vein as to suggest 
a belief in some kind of general urge to grow, or some kind of gen- 
eral growth force, which all growth processes must obey. It appears 
as if a child who has only eighty per cent of a normal growth force 
will grow only eighty per cent as fast as a child with a normal 
amount of this element; or, one with one hundred and twenty per 
cent of normal will grow at a rate of one hundred and twenty per 
cent. Obviously there is no single force which both mental and 
physical growth factors must obey and obey alike, because as we 
have seen, the rates of the two kinds of growth are essentially un- 
related. 

There is some relatedness in rate of growth among various phys- 
ical measures; and also among the various traits that we speak of 
as making up intelligence. Also, with respect to each set of meas- 
ures, growth tends to be regular and continuous; tends to proceed 
at the rate at which it starts out. Correlation coefficients of .80 and 
.90 have been obtained between measures of height taken several 
years apart, say at age seven and at age seventeen (6). Coefficients 
of .80 to .90 have also been reported between scores on intelligence 
tests obtained ten years apart (7, 8). 

Rate of growth, physical and mental, is probably governed both 
by hereditary and environmental factors. Heredity is set in the 
young infant, actually at conception, and remains constant. While 
environmental factors are, of course, subject to change, they prob- 
ably remain fairly constant for most children. These two factors 
should give stability and regularity to growth. A child who is go- 
ing to be large when he grows up, is large relative to others of his 
age all along—at six, twelve, or eighteen. Similarly, one who is 
going to be small at maturity is small at all stages of growth. This 
is the meaning of the positive correlations obtained between growth 
measures taken years apart. A child who is going to be bright or 
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dull at maturity is correspondingly bright or dull at six, twelve, 
or eighteen. 

Among physical traits there is no compelling necessity for a large 
or small child to be large or small all over. This is reflected in the 
fact that we think it desirable to utilize several different measures 
of physical growth. However, over-all, the various measures tend 
to be positively intercorrelated. The writers obtained (with Choitz’s 
data) the following intercorrelations: HA and WA, .57; HA and 
DA, .24; WA and DA, .10. Dearborn and Rothney obtained an 
average intercorrelation of .57 among Standing Height, Weight, 
Iliac, Chest Depth, and Chest Width. Intercorrelations among 
most of the measures that are commonly associated with OA appear 
to be considerably lower than this average. Lowell and Woodrow 
reported a coefficient of .20 between carpal age and number of 
permanent teeth (9). Gates obtained coefficients between carpal 
scores and lung capacity of .30, and strength of grip of .25; between 
strength of grip and height, .45, weight, .40, lung capacity, .46. 
Thus as Paterson observed, ‘Various aspects of physical growth 
are shown to be far from unitary.” 

We see that the unity of growth, one of the tenets of OA theory, 
is a somewhat questionable concept when applied to physical 
growth alone. It has very little meaning with respect to ‘total 
growth’. Proponents of OA theory have stated that the various 
aspects of growth may be viewed as interchangeable samples of 
total growth. It is further stated that “.... the various attributes 
in an individual tend to cluster about a center of gravity of growth 
of that individual and that the freedom to vary is restricted (10).”’ 

In exposition of this kind a writer is not ordinarily concerned with 
expressing a view as a testable hypothesis. When one interested 
in making a test tries to restate it in such a way that it may be 
tested statistically, one runs the risk of making a statement that 
is at variance with the author’s intended meaning. The statement 
just quoted seems to say that the various aspects of growth are 
positively interrelated and suggests the interpretation that any 
sample of K aspects of growth may be substituted for any other 
such aspects in any prediction based on such measures as the one 
intended. This interpretation may not be precisely the one in- 
tended, but it must be admitted that it is consistent with and fol- 
lows naturally from the general OA concept. We have already ex- 
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amined this interpretation in the light of evidence yielded by cor- 
relation studies. We shall now present findings derived from the 
application of a somewhat different type of statistical technique. 

This technique as we applied it involved an analysis of the total 
variability among three selected growth measures for the pupils in 
each of three grades, again using Choitz’s data. The three growth 
measures so employed were chosen to represent each of the three 
major types of growth measures included in the OA index. One was 
a measure of mental growth, Mental Age; one was a measure of 
physiological growth, Weight Age; and one was a measure of ana- 
tomical growth, Dental Age. These measures were admittedly se- 
lected because of their known low intercorrelations. However, this 
choice is not in any way inconsistent with the claim that the var- 
ious aspects of growth are interchangeable samples of total growth. 

It may be argued that the use of only three growth measures, 
as compared with the greater number commonly utilized, does not 
provide a valid test of the general hypothesis or ‘view’ in question, 
i.e., that K should be greater than three. To this it can be said that 
one method of testing a hypothesis is to determine whether or not 
there exist any specific exceptions to it. Failure to discover such 
exceptions does not prove the hypothesis but, as possible exceptions 
are eliminated, confidence in the hypothesis develops. On the other 
hand, the discovery of even a single exception does discredit the 
hypothesis, at least to the extent of demanding its revision to allow 
for this exception. 

It was with these principles in mind that analysis was made of 
MA, WA, and DA scores which would test the tenet of ‘unity of 
growth’, i.e., that the various attributes in a given individual tend 
to cluster about a center of gravity with a concomitant restriction 
of variation. 

‘he technique employed is that of component analysis, which 
has been developed in connection with the technique of analysis 
of variance. In a general way, this technique analyzes the total 
variation among all the growth measures available for all the in- 
dividuals into component parts, and provides an estimate of the 
proportions of this total variation which may be ascribed to each 
part. The total variation was analyzed into two components; 
namely, one, a component reflecting the variation from individual 
to individual among the means of the three growth measures, and 
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the other a component reflecting the variations among the three 
growth measures within the respective individuals. In this case 
the total variation referred to was determined separately for each 
of the grade levels—fourth, fifth, and sixth. 

It will be noted that the first component cited, in effect, deals 
with variations in OA from child to child, insofar as OA can be 
estimated from a pooling of three samples of it. The second com- 
ponent reflects the variation among an individual child’s growth 
measures. It, in effect, represents an averaging of such variations 
obtained separately for each child. Or, in the language of the pro- 
ponents of OA theory, the first component represents variation 
from center of gravity to center of gravity of the different children. 
The second component represents a sort of average of the variations 
among the various scores which comprise each child’s cluster. 

If the view under test is to prevail, it necessarily follows that 
the first of these components must account for a significant and 
substantial proportion of the total variation among the growth 
measures. The per cents associated with each grade group are as 
follows: 


Grade ist Component 2nd Component 
4 9% 91% 
5 12% 88% 
6 15% 85% 


Thus the proportion of total variation accounted for by the first 
component, among individuals, is far from substantial. Since so 
large a proportion of the total variation is accounted for by the 
second component, within individuals, it would appear that, in 
this case, the growth measures are not interchangeable samples of 
total growth and that freedom to vary is not restricted—certainly 
not restricted much.! 

Moreover, if we may assume that the MA, WA, and DA scores 
for a child constitute a random sample from a normally distributed 





1 It will be noted, inasmuch as the same three age scores were determined 
for each child, that a component analysis involving the proportions of total 
variation which may be ascribed to (1) variation in OA from child to child, 
(2) variations in mean age from attribute to attribute and (3) interaction 
between OA and attribute could have been employed. It was felt, however, 
that the two-component analysis selected represented a model more con- 
sistent with the general ‘views’ under examination than the three-com- 
ponent analysis here mentioned. 
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population or complete cluster of such growth scores,? then it is 
possible to test the statistical significance of the differences among 
the means from child to child. These differences were not found to 
be significant at any of the three grade levels studied. In other 
words, differences among children’s OA scores may be explained 
in terms of chance or accidental variations among the individual 
growth scores of the children involved. If this is the case, then the 
true centers of gravity based on theoretically complete clusters of 
scores will not differ from child to child within a school grade group 
and hence, OA scores which are but estimates of these true centers 
of gravity do not constitute a meaningful basis for discriminating 
one child from another. 

Since it is obvious, further, that the children in a grade group 
do differ with respect to single growth characteristics, the fact that 
these same children do not differ significantly in terms of the aver- 
age of a number of growth characteristics, is evidence of nothing 
more than the chance association of individual growth measures 
within an individual child. There is no systematic tendency, in 
other words, for a child advanced in Mental Age to be advanced 
also in Weight Age or Dental Age. 
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Evaluation of corrective reading treatment, like other therapies, 
is hampered by the requirement of humane treatment of subjects. 
Rigorous experimental design is therefore difficult to maintain. 
Nevertheless, criticisms (8, 13) of current evaluative studies are 
justified. Since assignment of subjects to experimental and control 
groups is based on convenience (4) or self-determination (14), ma- 
jor selective factors, e.g., motivation of subjects in each group, are 
often left uncontrolled. (The citations are merely representative; 
a number of others might have been used.) Furthermore, changes 
in test performance are reported as changes in ability, a question- 
able assumption. 

Further criticisms of current assessment studies relate to the 
utilitarian aim of effect on academic achievement (8) and perma- 
nence of performance gains, i.e., the ‘hothouse’ effect (1, p. 74). 
While slight differences in grade point average greater than those 
of control groups have been reported recently (2, 4, 5, 14), an over- 
whelming number of writers in the past either have not reported 
grade changes or have found little or no increase in academic status 
concomitant with increase in reading performance (3). Whether 
failure to report was due to oversight, lack of data, or absence of 
positive results is unknown. 

The present study was designed with the above criticisms in 
mind. An attempt was made to determine changes in reading per- 
formance, over a brief and over an extended interval, and in aca- 
demic status of students who volunteered or were referred for a 
corrective reading course at a state university. 


PROCEDURE 


Subjects of the study were seventy-four male and female fresh- 
men of the University of Michigan who came voluntarily or by 
referral to the Reading Improvement Division of the Bureau of 
Psychological Services during its first semester of operation, and 
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who completed the course.! Another twenty-one freshmen who 
attempted to register could not be enrolled due to lack of facilities. 
This group, equivalent to the experimental subjects in ACE Psy- 
chological Examination for College Freshmen mean scores, con- 
stituted a motivated, non-treatment control group. A third group 
comprised a representative sample of freshmen. This group was 
assembled by choosing every fiftieth name from an alphabetical 
list of freshmen. 

A further sample of thirty was drawn randomly from the ex- 
perimental group of seventy-four for follow-up testing to determine 
permanence of gain in reading performance sixty weeks after ter- 
mination of the course. 

The training procedures in the reading laboratory were similar 
to those used in most collegiate programs. Students met in small 
groups for two one-hour sessions each week for ten weeks and were 
encouraged to work by themselves in the laboratory for two one- 
half-hour periods. There was considerable variation in the extent 
to which this suggestion was followed. One of the weekly hour 
meetings occurred in a seminar room and was devoted to instruc- 
tion in the psychology of reading, e.g., methods for attaining rate 
flexibility, in study techniques, e.g., utilization of Francis P. Robin- 
son’s Survey Q3R method (7), and in preparing for and writing 
essay and short answer examinations. Discussion of common prob- 
lems was usual, and each session included mimeographed exercises 
concerning vocabulary, comprehension skills, critical reading or the 
like. It was concluded with a timed reading from Ruth Strang’s 
Study Type of Reading Exercises (12) or from Wilking and Web- 
ster’s A College Developmental Reading Manual (15). The second 
hour was spent in supervised laboratory practice with mimeo- 
graphed work sheets, fiction and non-fiction books, textbooks, pac- 
ing devices and a tachistoscope. The skills and principles have been 
described elsewhere (10). 

Assessment of changes in reading performance and in academic 
status concomitant with or following treatment was determined as 


follows: 
1) Reading Performance: Codperative Reading Test C2 (Vocabu- 





1 Attrition was less than ten per cent. While such a loss might distort 
the results, it appears from present evidence of the character of drop-outs 
that such distortion would depress rather than inflate results. Those who 
drop out tend to be ‘fast improvers’ (9). 
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TaBLe I.—CHANGES IN MEAN READING PERFORMANCE OF COLLEGE 
STUDENTS AFTER INTERVALS OF TRAINING 
(Ten Weeks) anp No Tratninc (Sixty Weeks) 











Codperative Reading Test C2 (N = 27) 

Skill Initial Final Follow-u 
(Form 1) 1 oe |qitet| Horm ¥) | ¢ jocitet| Horm Tt) | « lacie 

Score Score Score 
Vocabulary 54.3 7.1 | 38 55.9 6.9| 45 | 56.5 7.6) 40 
Level 55.9 6.6 | 49 58.9 6.0) 55 | 61.4* 7.1) 60 
Speed of Compre- 56.5 8.2 | 40 62.6* |11.3) 62 | 64.8** |10.6) 69 

hension 
































* Increase over initial mean significant at .05 

** Increase over initial mean significant at .01. 

t Percentile equivalents are based upon test manual norms as follows: 
Initial and Final: ‘“‘entering freshmen;’’ Follow-up: ‘‘freshmen (end of 


year)”’ 


lary, Level of Comprehension, Speed of Comprehension). Trazler 
High School Reading Test, Part I, (Rate, Comprehension). 

2) Academic Status: Registrar’s record of grade-point average 
(GPA). Probationary status. Forced withdrawal. 

A final category, voluntary withdrawal, includes those subjects 
who, at the time of transfer or withdrawal, were maintaining an 
acceptable grade point average of 2.0 or above. 


RESULTS 


Changes in reading status and permanence of gains are reported 
in Tables I and II and are presented graphically in Figure 1. Table 
I indicates the Coédperative Reading Test C2 mean scaled scores 
(standard scores by which equivalence among forms is attained) 
of experimental subjects at the initial (pre-training), final (post- 
training) and follow-up (lapse of 60 weeks) testings. Of the thirty 
subjects drawn at random from the original seventy-fcur*, two had 
voluntarily withdrawn from the University at the end of their 
third semester and one declined to codperate. Changes in reading 
performance were negligible for Vocabulary while Level of Com- 
prehension differed significantly (.05) between Initial and Follow- 
up testing. Speed of Comprehension, the skill most sensitive to the 





2 No differences approaching significance were found between the sample 
of thirty and the total group of seventy-four in reading scores. 
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TaBLeE II.—CHANGES IN READING PERFORMANCE OF CoLLEGE STUDENTS 
AFTER INTERVALS OF TRAINING (TEN WEEKS) AND No TRAINING 


(Stxty WEEKs) 
Traxler High School Reading Test 


























(N = 27) 
ae (Form A)| ** | (FormB)| © | (Forma | ¢ 
Rate (wpm) 219.4 | 37.7 | 272.2** | 55.9 | 297.2** | 85.7 
Comprehension (%) 75.6 | 10.2| 70.2 12.2| 76.3 12.1 
Rate of Comprehension 165.9 191.1 | | 226.8 | 





* Differencies in orate for Initial-Final and Final-Follow-up are signifi- 


cant at .05. 
** Increase over Initial Rate significant at .01. 


training methods used, showed increases of 22 and 29 percentiles 
over ten weeks and sixty weeks, respectively. (For the total group, 
N =74, all increases were significant at or beyond .05 at the end 
of training.) 

Table II includes similar results recorded on the Trazler High 
School Reading Test. An initial rate of 219.4 words per minute was 
found to be 272.2 (P < .01) and 297.2 (P < .01) at the final and 
follow-up testings. At the same time comprehension (immediate 
retention of story content) showed negligible change.’ It is of in- 
terest to note the continued large increases in standard deviation 
(P < .05), a result conforming to the principle that one effect of 
formal education is an increase in individual differences. Rate of 
Comprehension in an index of change which minimizes the effect 
of an undue increase in rate at the expense of understanding. It 
provides a useful index of progress despite certain statistical limita- 
tions. 

Figure 1 illustrates the same changes in performance during the 
training program and the permanence of gains. Vocabulary scores 
showed no change. Speed of Comprehension scores increased most 
rapidly over ten weeks and continued to increase as did Level of 
Comprehension scores and the Traxler Rate of Comprehension in- 
dex. Since the Coéperative Reading Test C2 results are presented in 





3’ An independent study of forms A and B of the Trazler High School 
Reading Test casts doubt on their assumed equivalence when used with 
adults (11). 
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Fig. 1.—Reading Performance of Experimental Subjects after Ten Weeks 
of Training Followed by Sixty Weeks of No Training. 


the figure as percentile equivalents, the differences reflect increases 
beyond those normally anticipated as a result of the academic 
experience alone, e.g., while a scaled score of 63 in Speed of Com- 
prehension is equivalent to the 55th percentile for entering fresh- 
men, a score of 64 is equivalent to the same percentile at the end 
of the year. 

Changes in academic achievement are reported in Table III. 
While no significant change occurred from the first to the second 
semester in the grade point averages of the Control and Representa- 
tive Freshmen groups, a significant increase (P < .01) occurred in 
the Experimental group. A GPA of 2.0 corresponds to a grade of 
C, 3.0 to a B, etc. It will be noted that the grade point averages 
of the second semester do not include the standing of students with- 
drawing from the University. Therefore, the results are probably 
conservative estimates of the true differences among the three 


groups. 
While Experimental and Control differed by .11 (P < .05) in 
status at the end of the semester during which training occurred, 


the difference increased to .25 (P < .01) in favor of the Experimen- 











156 The Journal of Educational Psychology 


TaBLe III.—ComparaTivE MEAN ACADEMIC ACHIEVEMENT AMONG 
EXPERIMENTAL, CONTROL AND REPRESENTATIVE FRESHMEN 



































GROUPS 
ist Semester 2nd Semester* 
Group Diff. Sig. 
N G.P.A. N G.P.A. 
Experimental 74 2.25 71 2.41 +.16 | P < .0l 
Control 21 2.14 17 2.16 + .02 NS 
Freshmen 46 2.29 40 2.22 — .07 NS 
ist Semester 2nd Semester 
G.P.A. Diff. Sig. G.P.A. Diff. Sig. 
Experimental v ll P < .05 .25 P < 01 
Control 
Experimental v .04 NS 19 P < Ol 
Freshmen 

















* Does not include subjects who withdrew from the University. 


tal group by the end of the second semester. A similar increase in 
achievement of the Experimental group over the Representative 
Freshmen group is shown (P < .01). 

Amount and kind of academic deficiency is reported in Table IV. 
Scholastic aptitude, inferred from group mean raw scores (con- 
verted to percentile equivalents) of the ACE Scholastic Aptitude 
Test for College Freshmen, was similar for the three groups. How- 
ever, the Experimental group included a significantly smaller per- 
centage on probation than did the Freshmen (10.8 per cent and 
32.6 per cent, P < .05). The Experimental group also included a 
significantly smaller percentage in the combination category of 
probationary and forced withdrawal than did either the Control 
group (12.2 per cent and 33.3 per cent, P < .05) or the Freshmen 
group (12.2 per cent and 41.3 per cent, P < .01). 

The superiority of the Control group over the Representative 
Freshmen, while not significant, suggests the commonly suspected 
difference in motivation, ‘fear-induced drive’, aspiration-level, or 
the like, between those who apply to a reading service and those 
who do not apply. 
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TaBLeE IV.—ComParRaTIVE MEAN ACADEMIC DEFICIENCY OF EXPERIMENTAL, 
CONTROL AND REPRESENTATIVE FRESHMEN GROUPS (2ND 









































SEMESTER) 
ACE ile % Total Z ]- 
Group* N % Bago ai Naa "eh a 
Experimental 74 56 | 40 | 46 10.8 1.4 | 12.2] (8.7) 
Control 21 60 | 40/48) 23.8 9.5 | 33.3 | (9.5) 
Freshmen 46 45 | 51 | 49| 32.6 | 8.7 | 41.3 | (4.3) 
Below 2.0 Total Deficiency 
Diff. Sig. Diff. Sig. 
Experimental v Con- 13.0% NS 21.1% P < .05 
trol 
Experimental Vv 21.8% P< .05 29.1% P < Ol 
Freshmen 

















* All groups were nearly identical at the mean in academic load. 
** Does not include subjects categorized under ‘forced withdrawal’. 


DISCUSSION 


Within the limits of sample size, subject characteristics and treat- 
ment methods, it may be concluded that changes did occur in 
reading performance and academic status concomitant with and 
following application of the corrective techniques used. It is the 
feeling of the writers that fulfillment of certain conditions may have 
been responsible for the results. First, it may be too much to ex- 
pect that improved reading performance in a laboratory will result 
in improved grades. The appropriate skills to be developed seem 
to be those directly involved in the study and examination situa- 
tions. Second, if we are to expect permanence of improvement in 
reading performance, it seems apparent that provision must be 
made for continued practice in attempting to read efficiently. The 
reader will recognize the two conditions as applications of familiar 
learning principles relating to transfer and performance. 

While the above conditions may be necessary, they are probably 
far from sufficient to bring about the maximum contribution to 
students. It is our opinion that important factors hardly touched 
as yet are those of personality structure-learning interaction (9), 
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perceptual development, treatment variables (especially the in- 
structor as a variable) and diagnosis. 


SUMMARY AND CONCLUSIONS 


Evaluation of collegiate reading improvement services have been 
justifiably criticised for inadequate experimental design. The pres- 
ent study concerns changes in reading performance, permanence 
of gains and concomitant changes in academic status of clients of a 
university reading service. An attempt was made to control the 
main variables confounding the results of previous investigations. 

Major conclusions are as follows: 

1) Significant gains in performance (though not necessarily in 
skill) result for those aspects of reading which are emphasized in 
training. 

2) Performance gains are maintained and, possibly, increased 
after a lapse of time (sixty weeks) with no formal training when 
continued practice is encouraged. 

3) Significant superiority in academic status (increasing with 
time) is demonstrated by experimental subjects over both control 
and representative freshmen subjects when study and examination 
skills are emphasized during the training period. 
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PROBLEM-SOLVING BY TEAMS AND BY 
INDIVIDUALS IN A FIELD SETTING' 


IRVING LORGE, JACOB TUCKMAN, LOUIS AIKMAN, 
JOSEPH SPIEGEL AND GILDA MOSS 


Teachers College, Columbia University 


In a previous study, the relationship between the method of 
presentation of a field problem and the quality of the written solu- 
tions was investigated (1). The field problem employed, the Mined 
Road Problem (2), adapted from that developed by the Office of 
Strategic Services, requires the formulation of a plan of action for 
getting a group of five men across the road mined with supersen- 
sitive enemy mines which can neither be neutralized nor dug up. 
The road is twelve feet wide, bordered with trees about forty feet 
tall. Scattered about one side of the road is a variety of potentially 
usable material and debris, including beams, boards, ropes, dis- 
carded auto tire, pulley, etc. As adapted, the Mined Road was 
presented at four laboratory levels of remoteness from reality: (1) 
verbal description (2) photographic representation (3) miniature 
scale model not allowing manipulation of parts and materials and 
(4) miniature scale model allowing manipulation of parts and ma- 
terials. Teams of five men and individuals as individuals were re- 
quired to solve the problem at each of these four levels of presenta- 
tion and write out their solutions. 

The evidence indicated that the written solutions at the four 
laboratory levels of remoteness from reality for teams and for in- 
dividuals were equivalent, i.e., not statistically different from each 
other. The written solutions by teams, however, were markedly 
superior to those by individuals at each of the four laboratory 
levels of presentation. The apparent equivalence in goodness of 
solutions among the laboratory levels of presentation was attrib- 
uted in part, at least, to the fact that teams and individuals at 
each laboratory level were given the opportunity to equalize the 
amounts of information about all aspects of the problem, either by 
perception directly or by questioning indirectly. The superiority 





1 Studies in Reality. No. 2. Conducted under contract Number AF 18 
(600) 341 of the Human Resources Research Institute, Maxwell Air Force 
Base, by the Institute of Psychological Research, Teachers College, Colum- 
bia University, Principal Investigator: Irving Lorge. 
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of team solutions over those of individuals, to a degree, may be 
attributable to the fact that teams of five members asked more 
questions than did single individuals, thus, increasing the likeli- 
hood of attaining better solutions because of more information. 

The laboratory settings in which the Mined Road Problem had 
been presented were quite remote from actual reality. The ques- 
tion that must be asked is whether differences would be obtained 
between settings more closely approximating reality and whether 
variations in conditions approximating reality would be signifi- 
cantly different from each other. If the Mined Road Problem were 
developed in actual reality in the field and individuals and teams 
were required to solve it in such a setting, it is likely that solutions 
superior to those in the laboratory would come because of the more 
direct knowledge by direct perception. If the field setting were 
varied so that the amount of direct information by perception 
could be changed, then, it is quite likely that such variations would 
introduce differences in the solution. For instance, one variation 
may be to allow the subjects just to see the actual field situation 
without touching any of the equipment around it. In such circum- 
stances, less direct information may be obtainable. Or, again, the 
subjects may be allowed to manipulate any or all aspects of the 
materials but not allowed to carry a solution to completion. It is 
possible that they would get more information by the direct manip- 
ulation but less than if they had completed the solution. Therefore, 
it is hypothesized that there will be a genuine difference in the 
quality of solutions between those made in laboratory settings and 
those made in field settings and, moreover, among the several 
possible levels of field presentations there would be differences at- 
tributable to differences in the amounts of information obtainable 
directly by perception. 

The Mined Road Problem was constructed in reality in open 
country. Three variations were possible with a real field setting: 

1) The Actual Real Without Manipulation, where men were not 
permitted to manipulate the beams, tire, ropes, etc. 

2) The Actual Real With Manipulation, where the men were per- 
mitted to manipulate all equipment but not to carry a solution to 
completion, and 

3) The Actual Real Solve, where the men were actually required 
to carry the solution to completion, i.e., to actually cross the road. 

The description of the problem was identical with that used in 
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the earlier study, except for such modifications as when necessary 
to meet the conditions of the field levels of presentation. For the 
actual real without manipulation, the description was as follows: 


You men are on your way back from a sabotaging mission in enemy- 
occupied territory. You have just blown up a bridge about a mile from here. 
According to a prearranged plan, you will meet a guerrilla truck about a 
mile away from here which leaves you only a very short time to get across 
this road. You have discovered that this road has been mined with a new 
type of enemy mine which is supersensitive and will blow up if anything 
touches it. The explosion at the bridge has aroused the enemy but as yet 
they do not know in what direction you are going. Your problem is to get 
the entire group of five men across the road and to leave as little trace as 
possible of your escape route. You must cross the road at this spot, which 


is marked off by the white tape. 
[RO] In determining your plan of action, you may observe but not manipu- 
late any equipment in the working area, which is marked off by the white 


tape. 
After you have decided how you are going to cross the road, one man is 


to write your plan of action on the paper provided. 

You may ask any questions you wish and we will try to answer them as 
best we can. Remember, it is important that you complete this mission as 
quickly and with as ¥ttle loss of men and material as possible. 

According to your schedule you must meet the guerrilla truck in one 
hour. Remember, also, that your performance on this mission will be com- 
pared with that of other members of Air Force Reserve Officers Training 


Corps personnel. 


For the actual real with manipulation, the following paragraph 
was substituted for paragraph RO: 
In determining your plan of action, you may manipulate any equipment 


found here in the working area, which is marked off by the white tape. 
However, at no time may any man or equipment touch or extend over the 


road. 


For the actual real solve, paragraph RO was omitted. 

The subjects were cadets of the Air Force Reserve Officers Train- 
ing Corps at Manhattan College, a corps of men fairly homogene- 
ous in background and intelligence and highly motivated to work 
with tactical problems. Ten teams of five men and ten individuals 
as individuals, randomly selected and assigned, were set the task 
of solving the problem at each of the three field levels. If the prob- 
lem was given for team solution, the five team members were 
brought as a group to the field and each given a typed problem to 
read to themselves while the examiner read it aloud. If the problem 
was presented for individual solution, only that person was brought 
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to the field, given the typed description to read to himself while 
the examiner read it aloud. Teams and individuals were permitted 
and encouraged to ask questions. 

For the actual real without manipulation and for the actual 
real with manipulation, teams and individuals proceeded with their 
interrogations and deliberations until a solution to the problem was 
ready to be written. Then either the individual or a member of 
the team wrote the solution. For the actual real solve, the solution 
was written by the individual or by a team member after the road 
crossing had been accomplished. Thus, sixty written solutions were 
obtained, ten by teams and ten by individuals for each of the three 
field presentations. 

The sixty written solutions were evaluated by Quality Point 
Score (QPS), i.e., the aggregate of separate weights for each element 
in the solution. The weights for each element were estimated, giv- 
ing consideration to factors, such as safety, efficiency, workability, 
and quality of thinking. The QPS centered around four major ele- 
ments: (1) Bridging the road, i.e., the use of beams or a combination 
of beams, swinging, broadjumping, throwing men or materials 
across, making an overhead bridge, etc. (2)Removing the bridging 
from the road, i.e., any procedure for removing the basic bridging 
or secondary bridging. (3) Removing clues indicative of escape or 
escape route, i.e., materials hidden or carried away. And (4) time, 
i.e., elapsed time from presentation of the problem to the formal 
written solution. The written solutions for the three field levels of 
presentation contained some new elements which did not appear 
in the solutions at the laboratory levels more remote from reality 
in the earlier study. Each such new element was classified properly 
and assigned a reasonable numerical weight. 

For teams and for individuals, at each of the three field levels 
of presentation, the means and standard deviations of the QPS are 
given in Table 1. The analysis of variance among levels of presen- 
tation and between teams and individuals is presented in Table 2. 

No evidence of significant differences among the solutions at the 
three field levels of presentation was obtained either for teams or 
for individuals. At every field level of presentation, however, the 
solutions by teams were statistically superior to those by individ- 
uals. The apparent equivalence among field levels of presentation, 
as well as the significant differences between teams and individuals, 
are like those reported in the earlier study. 





164 


The Journal of Educational Psychology 





TaBLE 1.—Qua.ity Point Score ror Minep Roap Soiutions: MEANS 
AND STANDARD DEvIATIONS, BY TEAMS AND BY INDIVIDUALS AT EACH 
OF THE THREE FIELD LEVELS OF REMOTENESS FROM REALITY: TEN 

Teams AND TEN INDIVIDUALS aT Eacu LEVEL 











Teams Individuals 
Level of Remoteness from Reality 
Mean SD Mean SD 
Actual Real without Manipulation 44.8 29.0 11.9 14.3 
Actual Real with Manipulation 54.2 29.1 15.5 18.7 
Actual Real Solve 57.9 16.4 39.6 25.2 

















TaBLE 2.—QvuaLity Point Score ror Minep Roap So.vutTions: ANALYSIS 
OF VARIANCE BETWEEN TEAMS AND INDIVIDUALS AND AMONG THE 
THREE Fietp LEVELS oF REMOTENESS FROM REALITY 








Source a Sarena Mean Square | F Ratio 
Class (Teams and Individuals) 13470.0 1 13470.0 | 23.1* 
Treatment (Level of Reality) 4344.1 2 2172.1 3.7 
Class X Treatment 1105.0 2 552.5 
Error 31499 .9 54 583.3 
Total 50419 .0 59 

















* Significant at the .01 level. 


Although no significant differences were found among the solu- 
tions at the four laboratory levels of presentation in the earlier 
study and at the three field levels of presentation in the present 
study, the question may be raised whether there are differences 
between levels of presentation in the laboratory and those in the 
field. To answer this question, the analysis of variance was made 
for QPS among the seven levels of presentation. The analysis 
showed that there were no significant differences in QPS among 
the seven levels of presentation, which means no differences be- 
tween laboratory and field. 

A qualitative analysis of the elements in the solutions showed 
some differences between solutions by teams in the laboratory and 
field. Of the teams solving the problem in the field, twenty per cent 
used a diving board method to get men from the starting, to the 
other side of the road; but, no teams employed this method in the 
laboratory setting. This diving board involved placing the beam 
on a fulcrum, pushing the beam partway over the road so that one 
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or more men could walk to the end of the beam and jump to the 
far side, while the remaining men hold down the base end of the 
beam. Twenty per cent of teams in the field used swinging to get 
men across the road. Although the same proportion of teams used 
swinging in the laboratory, there are differences in how the swing- 
ing was done. In the field, all but one team used swinging in con- 
junction with bridging with a beam. In the laboratory, all but one 
team used swinging as the only way of crossing the road, or to set 
up an overhead cable. Only seventeen per cent of field teams, com- 
pared with fifty per cent in the laboratory, used an excessive 
amount of time to solve the problem, thus failing in that they would 
not meet the guerrilla truck. For individuals, there was no differ- 
ence between field and laboratory settings. 

Differences between teams in the field and those in the laboratory 
may be due to differences in perception of the problem situation. 
Despite the differences described above, the elements used in the 
written solutions were basically the same for the laboratory and 
field. 

It was hypothesized that levels more closely approximating ac- 
tual reality in a field setting would be differentiated from presen- 
tations in laboratory settings which are quite remote from reality. 
The rationale for the hypothesis is that more information can be 
obtained directly through perception in the actual field settings 
than indirectly by questioning in laboratory settings. The facts 
indicate that there are no significant differences in solutions be- 
tween laboratory and field levels of remoteness. Moreover, the 
hypothesis that there would be differences among the three field 
levels of presentation because of the differences in the amounts of 
information attainable by direct perception was not supported: 
none of the differences found among the three field levels of remote- 
ness were significantly different from each other. Indeed, there 
were no significant differences among all seven levels of presen- 
tation. 

The finding maintained at each level of presentation, both field 
and laboratory, was that groups are superior to individuals in the 
quality of solutions written. Although the evidence in this study 
supports the fact that groups are superior to individuals, further 
investigation is necessary to ascertain whether the generalization 
about group superiority is attributable to interaction among group 
members, or, as has been suggested by Lorge and Solomon, attrib- 
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utable to the greater probability of getting at least one solver in 
any group. Research on this phase is now in progress. 
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PUPILS’ VALUES AND THE VALIDITY OF THE 
MINNESOTA TEACHER ATTITUDE 
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Does the degree to which a characteristic of a teacher affects 
his acceptance by a pupil depend on what the pupil wants to get 
out of the relationship? That is, do the values of the pupil determine 
what characteristics of the teacher will influence the pupil’s per- 
ception of him? If so, and if the characteristics of the teacher are 
considered to be determiners of the pupil’s evaluation of him, then 
the values of the pupil may be viewed as ‘second-order determiners’ 
—i.e., determiners of what characteristics will be determiners. 

‘Second-order determiners’ are akin to the kinds of research 
variables that have been espoused by Edwards and Cronbach (8) 
and by the Committee on Criteria of Teacher Effectiveness of the 
American Educational Research Association (1, pp. 253-254). The 
former writers recommended that research on methods of psycho- 
therapy take into account characteristics of the clients, or ‘organ- 
ismic variables’. The AERA Committee referred to such character- 
istics of pupils, which might change how a given teacher affected 
the pupils, as ‘intervening variables’. 

Similarly, social psychologists have conceptualized leadership as 
a relation between leader and follower. How this approach has 
developed is described by Newcomb (15, pp. 650-663), and its 
implementation in research on leadership is illustrated by Sanford 
(16, pp. 328-340). The latter’s study shows how the follower’s 





1 A more complete presentation of the data and instruments of this study 
is available in G. M. Della Piana, “‘Cognitive-Affective Values of Pupils 
and Teacher-Pupil Relationships,’”’ unpublished Master’s thesis, 1953, on 
file in the University of Illinois Library. The thesis was done at the sug- 
gestion and under the direction of the second author. A grant from the 
Bureau of Educational Research, College of Education, University of IIli- 
nois, supported the research. We are grateful to Professors Lee J. Cronbach 
and David R. Krathwohl for a critical reading of the manuscript. 
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attitudes, preferences, and needs tell us about his ‘readiness for 
leadership’. 

In this study, we consider the teacher as a leader and we deal 
with one aspect of the teacher’s effectiveness: how his pupils rate 
his competence. This study emphasizes the values of the pupil as 
factors in teacher effectiveness. It was partly suggested by an in- 
vestigation (9) in which the correlation between the Minnesota 
Teacher Attitude Inventory (MTAI) scores of twenty teachers and 
ratings of the teachers by their students was —.18, i.e., negative, 
but not significantly different from zero. How could this result be 
assimilated to other findings (2, 3, 11, 12, 13) of significant positive 
relationships? The two hundred students in the Gage-Suci study 
were the entire enrollment of a university high school. Impression- 
istic evidence suggested that these students had relatively strong 
‘cognitive’ values with respect to teachers. That is, they wanted 
teachers with high ‘cognitive merit’ who could help them satisfy 
their ‘cognitive’, or knowing and understanding, purposes as dis- 
tinguished from their ‘affective’ needs for security and acceptance. 
Such ‘cognitive’ values could render the affective components of 
teacher competence less important to the students. 


METHOD 


Test Materials.—Teacher attitude was measured with the MTAI. 
This instrument was constructed to measure attitudes related to 
ability to get along with pupils (3). In our view, these attitudes 
bear particularly on the teacher’s ‘affective merit’—or effectiveness 
in satisfying the emotional needs of pupils for security and accept- 
ance. It was scored with the published ‘rights’ minus ‘wrongs’ keys. 

General pupil ratings of teachers were obtained by use of the 
fifty-item ‘“‘My Teacher” inventory constructed by Leeds (1/, 12) 
to appraise pupils’ attitudes toward their teachers. This inventory 
was used to provide an exact replication of previous validations 
of the MTAI. The questions in this inventory deal with the teach- 
er’s disposition, his interest in children’s activities, his status in 
the pupils’ esteem, his sense of humor, and similar matters per- 
tinent to teacher-pupil relations in the pupils’ frame of reference. 
A rating for each teacher was obtained by scoring each pupil’s 
Yes-No-? responses (number of favorable minus the number of 
unfavorable ratings of the teacher) and then computing the mean 


over pupils. 
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Pupils’ ratings of logically defined characteristics of teachers 
were obtained with an extension of Leeds’ ““My Teacher” inven- 
tory. To test our hypotheses we needed three additional inventories, 
for logically more specific kinds of teacher merit. A mean rating 
for each teacher was obtained for each of these three measures by 
averaging pupil scores consisting of the number of favorable re- 
sponses. A brief description of these measures follows: 

A) General Merit, or the extent to which a teacher is liked or 
disliked, without specification of any particular kind of reason for 
the like or dislike. Two of the eleven items of this inventory follow: 


57. Would you like to move to a different teacher 


I I en in a is hc .Yes No ? 
63. Are most of the pupils glad to be in this 
tn Yes No ? 


B) Cognitive Merit, or the extent to which a teacher is seen as 
effective in helping the pupil achieve the traditional cognitive, 
intellectual, subject matter objectives of school learning. Two of 
the twelve items in this inventory follow: 


6. Does this teacher explain school work so that 
you can understand it?................. Yes No ? 
56. Do you learna lot of thingsfrom thisteacher?. Yes No ? 


C) Affective Merit, or the extent to which a teacher is seen as 
effective in helping the pupil satisfy his social and emotional needs, 
especially through providing a warm and supportive personal re- 
lationship. Two of the eleven items in this inventory follow: 


3. Does this teacher, whose name is written above, 
praise you for doing good work?......... Yes No ? 
4. Does this teacher scold the pupils a lot?..... Yes No ? 


These three logically defined inventories contained a total of 
thirty-four items, obtained as follows: Fifteen of the thirty-four 
items were written by the authors to fit the definitions of the cate- 
gories; the other nineteen items were selected from among those 
already available in Leeds’ ““My Teacher” inventory because they 
seemed to fit our definitions of the categories. Eleven judges (edu- 
cational psychologists) were given the thirty-four items to classify 
according to the three definitions. At least eight of the eleven judges 








170 The Journal of Educational Psychology 


agreed on the classification of all thirty-four items; on seventeen 
of the items, the judges were unanimous. 

Pupils’ cognitive-affective values with respect to teachers were 
measured with a forced-choice inventory (‘“‘Choosing Things About 
Teachers’”’) which was designed to reveal how much pupils value 
the two kinds of teaching merit (cognitive and affective) involved 
in our hypotheses. This inventory included thirty-six items, each 
containing two phrases—one describing cognitive merit and one 
describing affective merit. Pupils were asked to check the phrase 
in each pair which described the quality they wanted more in a 
teacher. A sample item follows: 

Which do you want more? A teacher who 
1. doesn’t make us afraid or, 
2. explains so we can understand. 

Pupil scores consisted of the number of ‘cognitive choices’ made. 
A mean cognitive-affective value score was obtained for each class. 
This mean score represents the kind of teacher a given class tends 
to value and is not a rating of the actual teacher. Due to the forced- 
choice nature of the test, a high class mean indicates high cognitive 
value orientation, while a low class mean indicates high affective 
value orientation. 

Subjects.—Ninety-seven of the ninety-eight fourth-, fifth-, and 
sixth-grade classes in a mid-western city of 66,000 population par- 
ticipated in the study. This group, consisting of ninety-seven 
teachers and their 2,704 pupils, is similar to the groups used in 
validation studies by the authors of the MTAI. Class size ranged 
from six to thirty-nine pupils, with a median of twenty-six. Pupils 
and teachers were assured that individuals and schools would not 
be identified. 

Administration.—All devices were administered by graduate stu- 
dents from outside the school system. A detailed manual of direc- 
tions was followed to maximize uniformity of procedure. The 
teachers took the MTAI in another room while their pupils were 
filling out the various inventories. All pupils finished in less than 
fifty minutes. The tests were given in late spring, after pupils had 
been with their teacher for more than six months. 


RESULTS 


Characteristics of Tests——(Reliability) The reliabilities of the 
MTAI, Leeds, Affective merit, and General merit inventories were 
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all above .90. The Cognitive merit reliability estimate was .83. 
Guttman’s L, coefficient (6) was used for the MTAI estimate, and 
Horst’s coefficient (10) was used for the other estimates. These 
measures appear to differentiate among teachers and classrooms 
with sufficient accuracy for testing our hypotheses. 

Two estimates of the reliability of the Cognitive-Affective value 
scores were obtained: A Kuder-Richardson Formula 21 estimate 
of .89 and a Horst estimate of .74. While this K-R 21 coefficient 
indicates that our measures of the Cognitive-Affective Values of 
individual pupils are internally consistent, it is irrelevant when 
we are using mean scores of a class to test our hypotheses. Because 
the Horst coefficient for Cognitive-Affective Value class means was 
only .74, we used only the classes at the extremes of the distribu- 
tion of these means for testing hypotheses involving Cognitive- 
Affective Value scores. Hence, in one method of testing Hypothesis 
II, below, we used only the twenty classes with the highest mean 
Cognitive Value scores and the twenty classes with the highest 
mean Affective Value scores. For the other method described be- 
low, the K-R 21 coefficient of .89 appears adequate. 

(Intercorrelations) As noted below, one of our hypotheses de- 
pends upon distinctions between logically defined types of teacher 
merit as judged by pupils. Unless these measures are substantially 
independent, we cannot use them to test hypotheses concerning 
the differential relevance of the MTAI to teacher effectiveness. 
The intercorrelations shown in Table 1 indicate how our logical 
distinctions held up empirically. The r between the logically and 
operationally independent Affective and Cognitive Merit scores is 
.68. Taking into account the reliabilities and intercorrelation of 
these mean pupil scores, we find, using Bennett’s monograph (4, 
p. 152), that only about twenty-one per cent of the differences in 
these test scores may be considered in excess of chance expectation. 
When corrected for attenuation, their intercorrelation becomes .79. 
Tests of hypotheses involving use of these scores as differential 
measures are therefore unlikely to yield significant results; we are 
especially likely to find the null hypothesis tenable when it is in 
fact false. This overlap does not, however, preclude using these 
scores as measures of pupils’ general approval of teachers. 

The Leeds inventory asked pupils to describe their own teacher; 
the values inventory asked them to check which characteristic of a 
teacher they would prefer if they had to choose between a ‘cog- 








172 The Journal of Educational Psychology 


TABLE 1.—CoRRELATIONS AMONG MTAI anp MEAN Pupit ScoRES OVER 
Aut Teacuers (N = 97) 








Source 2 3 4 5 6 
1. MTAI .256* .210* | .262** | .248* .04 
2. Leeds’ ‘‘My Teacher’’ t t t .05 
Inventory 

3. Affective merit .68** .84** 10 
4. Cognitive merit .70** .08 
5. General merit .03 
6. Cognitive value 




















* Significant at .05 level 

** Significant at .01 level. 

Tt Since scores 3, 4, 5 are based in part on items also contained in score 
2, their correlations with the latter are omitted. 


nitive’ and an ‘affective’ virtue. Table 1 shows that the correlations 
of the means of the cognitive value scores with the means of each 
of the ratings of teachers are very low; our inventory of pupils’ 
values measured something different from pupils’ attitudes toward 
their own teacher. 


TESTS OF HYPOTHESES 


Hypothesis I. Teachers’ attitudes as measured by the MTAI cor- 
relate positively with mean pupil ratings of teachers on the Leeds in- 
ventory. Our obtained coefficient of .26, significant at the .05 level, 
corroborates previous studies with similar teacher-pupil popula- 
tions (12, 13). That is, the teachers’ MTATI scores have a significant 
positive correlation with their concurrent ratings by their pupils 
on the Leeds inventory. 

As shown in Table 1, the correlations between teachers’ MTAI 
scores and their Affective merit, Cognitive merit and General merit 
ratings were all positive and significant at or beyond the .05 level. 
Since these scores were highly correlated with Leeds scores they 
simply offer additional evidence for a significant positive correla- 
tion between teachers’ MTAI scores and pupils’ general approval 
of teachers. From our earlier discussion, however, our prediction is 
that this relationship will change when pupil value orientation is 
taken into account. Our next hypothesis bears on this point. 

Hypothesis II. Teachers’ MT AI scores correlate with pupil liking 
for teachers more highly where pupils value more highly the social- 
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TABLE 2.—CoRRELATIONS BETWEEN MTAI anv Puprt RATINGS FOR 
TEACHERS GROUPED ON THE BASIS OF THEIR PuprILts’ MEAN 
CoGNITIVE-AFFECTIVE VALUE ScoRE 











: Pupil Ratings Correlated with MTAI 
Class Cognitive Value N 
Standing General | Cognitive | Affective Leeds 
merit merit merit rating 
Highest 20 24 15 .05 .05 
Middle 57 -16 18 .20 24 
Lowest 20 .59** .58** .50* —-" 




















* Significant at .05 level. 
** Significant at .01 level. 


emotional need mediating behavior of teachers. Specifically, for each 
affective merit-type measure (general merit rating, affective merit 
rating, and Leeds ‘““My Teacher” rating) the correlation between 
MTAI score and affective merit-type measures for the most affec- 
tively oriented pupils will be greater than that for the most cog- 
nitively oriented pupils. In brief, we hypothesize that: 


T(MTAI vs. affectively valued pupils’ > TTat vs. cognitively valued pupils’ 
ratings) ratings) 
Two methods of analysis were used to test this hypothesis. 

IIa. Here we divided pupils into high and low cognitive value 
groups on the basis of the mean score of their class. Table 2 sum- 
marizes results of this analysis. The differences between r’s are all 
in the hypothesized direction. Although not hypothesized, the cor- 
relation between MTAI scores and cognitive merit ratings also 
changes in this direction. In view of the high correlation of cognitive 
merit ratings with affective merit, general merit and Leeds ratings, 
this result is to be expected. The difference between high cognitive 
and high affective classes in the correlations of MTAI scores with 
Leeds ratings is significant at the .05 level. Here the r for high cog- 
nitively oriented classes is .05, while that for low cognitively ori- 
ented classes is .57. The test of significance used here was between 
two z-transformations of r, for Ni; = Nez = 20; since the direction 
of the difference was hypothesized in advance, a one-tail test of 
significance was used. 

This significant difference might be due to differences in the 
heterogeneity (or ‘range of talent’) of these scores in the two groups. 
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TaBLE 3.—RELIABILITY OF MTAI anp LEEps Scores 1n Low AND HicH 
Coa@niTivE VALUE Groups 
































se | Mean* Range | SD _| Reliability** 
MTAI score 
Highest 20 72.00 12 to 115 29.8 .90 
Lowest 20 61.75 —36 to 106 37.1 .92 
Leeds score 
Highest 20 19.28 | —9.16 to 36.04 9.79 .93 
Lowest 20 16.02 | —5.31 to 30.04 9.87 .90 




















* On both scores, the means of the two groups are significantly different 
at the .001 level of confidence. 
** For the MTAI this is Guttman’s L, coefficient (6); for the Leeds score, 


this is Horst’s coefficient (10). 


Table 3 presents data bearing on this possibility. The differences in 
the variances and reliability coefficients were not significant. In 
fact, the only significant differences noted between the high and 
low cognitive value groups are differences between the means for 
each group of teachers on the MTAI and Leeds devices. These 
differences have no bearing on the correlations under consideration. 
So differences in group heterogeneity with respect to the variables 
under consideration do not account for the difference between the 
correlations of the MTAI with the Leeds scores in the high and 
low cognitive value groups. Indeed, when the correlations were 
corrected for attenuation, they differed even more than previously ; 
the corrected correlation between MTAI and Leeds score for the 
high cognitive group was .05; for the low cognitive group, the cor- 
rected r was .63. 

ITb. A second method of testing Hypothesis II is the following: 
the correlation between teachers’ MT AI score and affective merit- 
type rating based on the most affectively oriented pupils in each 
class will be greater than that based on the most cognitively ori- 
ented pupils in each class. Here, Hypothesis II takes the form: 


T(MTAI vs. rating by most affectively > T(MTAI vs. rating by most cognitively 
oriented pupils in each class) oriented pupils in each class) 


In this method of testing Hypothesis II, we divide the pupils in 
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TaBLe 4.—CoRRELATIONS BETWEEN MTAI anv Pupit RatTiInes 
FoR Pupits GROUPED ON THE Basis or THEIR 
CoGNITIVE-AFFECTIVE VALUE ScorRE 








r 
Leeds ratings MTAI (Leeds 
No. of vs. 
Group of Pupils in Each [| Groups J MTAI, 
Class According to and of Relia- > | cor- 
Cognitive Value Score | Teach- Mean SD of| bility | 4, m Reli- | = poste’ 
ers ili ora 
“4 means | (Hor-| Mean} S wy Le tenua 
st) ~ tion) 





Above median of their| 97 (22.22) 9.85) .83 |56.80/35.63) .90 | .21 | .24 


class 
Below median of their} 97 [19.36)11.97| .85 |56.80/35.63) .90 | .25 | .29 


class 


All pupils with 21 or | 64 /20.66/10.91) .80 |52.03/33.86) .90 | .29 | .35 


above 
All pupils with 15 or | 64 /|17.27/13.28) .79 |52.03/33.86) .90 | .35 | .42 


below 


5 pupils with highest | 64 (|23.52)10.56| .52 wane uaa, .90 | .21 | .31 
scores, of those 21 


or above 
5 pupils with lowest | 64 (16.62/15.13| .74 |52.03/33.86) .90 | .27 | .33 


scores, of those 15 
or below 
































each class on the basis of their cognitive-affective values. We then 
compute two mean Leeds ratings for each teacher: one for the most 
affectively oriented pupils and another for the most cognitively 
oriented pupils. The teachers’ MTAI scores are then correlated 
against each of the two sets of mean ratings. 

Table 4 shows the results obtained by three procedures for di- 
viding the pupils in each class into high and low cognitive value 
groups. The first procedure was to divide the pupils in each class 
into groups that were above and below the median cognitive 
value score of their own class. For the pupils above the median, 
the correlation of their Leeds ratings and their teachers’ MTAI 
scores was .21; for pupils below the median this r was .25. Corrected 
for attenuation due to the imperfect reliability of the mean Leeds 
ratings and MTAI scores, they become .24 and .29, respectively. 

The second procedure for dividing the pupils in each class was 
to choose all pupils who had a cognitive value score of 21 or above 
for the high cognitive value group, and all those with scores of 15 
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or below for the low cognitive value group. Since not all classes 
had at least five pupils with scores above or below these cutting 
points, we were able to use only sixty-four classes in this analysis 
of the data. Here the r for the high cognitive value group was .29, 
while that for the low cognitive value group was .35. Corrected for 
attenuation in both MTAI and mean Leeds ratings, the coefficients 
become .35 and .42, respectively. 

The final procedure for dividing the pupils was to use only the 
five pupils with the highest scores, of those pupils who had cogni- 
tive value scores of 21 or above, and only the five pupils with the 
lowest scores, of those that had scores of 15 or below. This was done 
to make the numbers of pupils the same in the high and low cog- 
nitive value groups. This would also make the r’s between Leeds 
ratings and MTAI scores more comparable. The correlations ob- 
tained with these kinds of groups were .21, for the high cognitive 
group and .27 for the low cognitive group. Corrected for attenua- 
tion these r’s become .31 and .33, respectively. 

Obviously, the differences between r’s by each of the three pro- 
cedures within this second method of analysis are not nearly so 
large or significant as that obtained by the first method of analysis, 
where the twenty highest and lowest cognitive value classrooms 
were used intact. The differences are all in the hypothesized direc- 
tion, by this second method of analysis. They are probably not 
statistically significant although a suitable method of testing the 
significance of differences between such r’s is not available (14, 
pp. 217-218). 

How might we explain the differences in results between these 
two methods of analysis? However speculative and a posteriori, 
the following two arguments may be considered: First, it is possible 
that the intact classroom must be the unit of analysis. In breaking 
up classes into high and low cognitive value sets of pupils, we may 
be losing sight of the possibility that, when pupils rate their teach- 
ers, they are influenced not by their own individual values but 
by their perception of the entire class’s (mean) values. In other 
words, the disagreement in results between our two methods of 
analysis suggests that the entire class’s mean of pupils’ values may 
have provided a frame of reference for the entire class and influ- 
enced their Leeds ratings accordingly. Breaking up the class into 
high and low cognitive value groups would not destroy the in- 
fluence of this class-wide frame of reference. 








Minnesota Teacher Attitude Inventory 177 


Also, the less definitive results of the second method of analysis 
may be accounted for by the fact that the Leeds ratings of each 
teacher are in this instance based on fewer pupils. This makes the 
mean ratings less reliable; hence the two correlations of these mean 
ratings with the MTAI both regress toward zero and become more 
equal. The correction for attenuation applied to the two more equal 
r’s, since it raises both r’s about equally, cannot then do justice to 
any hypothesis involving differences between r’s. 

At any rate, the two methods of analysis agree in the direction 
of the differences between r’s yielded by them. In this sense, Hy- 
pothesis II is supported, at the .05 level by the first method of 
analysis, but without statistical significance by the second. 

Hypothesis III. Teachers’ MTAI scores correlate positively with 
the cognitive value orientation of their classes.—This hypothesis is 
based on two assumptions: (a) A teacher with a higher MTAI 
score is likely to be more successful in satisfying his pupils’ social- 
emotional needs; his pupils will tend less to need or want this kind 
of need-satisfying behavior from a teacher. (b) With social-emo- 
tional needs satisfied, other needs become dominant; in the class- 
room, the need to learn ‘subject matter’ is likely to become the 
dominant one for most pupils. 

The negligible correlation between MTAI and cognitive value 
(r = .04) does not, however, support this hypothesis. 


SUMMARY 


Current theory suggests that leadership consists of an interaction 
between leaders’ characteristics and followers’ values and needs. 
Within this framework, we hypothesized that pupils’ liking of 
teachers is a function of the interaction between pupils’ values and 
teachers’ attitudes. This means that the significant positive rela- 
tionship (corroborated in this study) between teachers’ scores on 
the Minnesota Teacher Attitude Inventory and ratings of the 
teachers by their pupils, should change predictably as pupils’ value 
orientations are varied. A forced-choice values inventory was con- 
structed to measure how much pupils want teachers with cognitive 
merit (effectiveness in helping pupils achieve intellectual objectives 
of school learning) as against affective merit (effectiveness in help- 
ing pupils satisfy their social-emotional needs). For the pupils who 
have stronger affective values, the teachers’ MTAI scores do cor- 
relate more highly with how much they are liked by the pupils. 
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Accordingly, the MTAI will vary in validity for teacher effective- 
ness according to the values of the pupils interacting with the 
teacher. Teachers scoring high on the MTAI will probably be bet- 
ter liked by pupils who have strong affective values concerning 
teachers. If the pupils have strong cognitive values, the teacher’s 
MTAI will make less difference. 

The study supports the validity of the interactional point of 
view in the study of teacher-pupil relationships, or leadership in 
general. Attitudes and similar characteristics of teachers depend 
for their significance on the values, needs, and other characteristics 
of their pupils. 
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One of the problems inherent in individual intelligence testing 
is evaluating and weighting ambiguous verbal responses to test 
questions. The Comprehension Subtest of the Wechsler-Bellevue 
Scale Form I seems to offer particular difficulty to testers. The pur- 
pose of this study was to determine the amount of agreement ex- 
hibited by a number of judges with varying degrees of training and 
experience in scoring two-hundred and fifty-four actual responses 
to the last nine items of the Comprehension Subtest. 

The Wechsler test has been widely used in research since its pub- 
lication in 1939, and has itself been the subject of study. The re- 
sults of these various studies have been summarized in reports in 
1945 (2) and in 1951 (8). 

The Comprehension Subtest of the Wechsler-Bellevue consists 
of ten questions designed to test ‘common sense’ (4, p. 81). These 
items are scored by credits of a maximum of 2 points for the best 
answers, 1 for acceptable responses, and 0 for wrong responses. 
In administering the test the authors of this study frequently re- 
ceived ambiguous, hard-to-score responses to all but the first 
item, as for example in item 5. 

Question 5: ‘Shoes’? Why are shoes made of leather? 
Answers: Don’t tear up easy, look good. 
Holds better than other material. 
This study was designed to determine the extent of agreement and 
disagreement on such ambiguous responses among examiners of 
varying experience. 
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PROCEDURE 


Subjects—A total of two-hundred and fifty-four ambiguous re- 
sponses were selected from three-hundred and seventy Wechsler- 
Bellevue scales. These responses were given by a male reformatory 
population ranging in age from seventeen through twenty-nine 
years, with a mean age of 21.4 years. The highest school grade com- 
pleted, as reported by the subjects, was 8.6, and the mean grade 
placement as determined by the Progressive Achievement test 
was 7.5. The mean Verbal IQ (W-B I) was 92.4 and the mean 
full IQ as measured by the California Test of Mental Maturity 
was 90.5. Most of these men did manual labor before entering the 
institution. Seventy per cent were classified as unskilled workers, 
twenty-five per cent were semi-skilled and five per cent were in 
clerical occupations. 

Items.—The original responses were recorded verbatim by the 
examiners as the subjects were tested. These responses were mimeo- 
graphed and were given to a number of judges, of varying training 
and experience, to be evaluated. 

Judges.—Two groups of judges were used. The first was an ex- 
perienced, ‘expert’ group of twenty-four psychologists. Three had 
administered one-hundred or more Wechsler-Bellevue scales, two 
had administered one-hundred and fifty or more, and nineteen 
had given over two-hundred. Eleven were ABEPP Diplomates in 
clinical psychology. Six were Associates in the Clinical or Counsel- 
ing and Guidance Divisions of the A.P.A. Seven were not members 
of the A.P.A., but were recommended by Diplomates as being 
qualified authorities in the administration and scoring of this test. 
The judges worked as professional psychologists in eleven states. 

The second group of judges was made up of psychology graduate 
students with varying degrees of training and experience in: ad- 
ministering individual tests. 


RESULTS 


The scoring of these two sets of judges is summarized in the 
tables. The experts gave a range of from twelve to thirty-eight per 
cent of the two-hundred and fifty-four responses a rating of 2; 
from thirty-five to sixty-seven per cent a rating of 1, and from 
nineteen to forty-six per cent a rating of 0. The variation among 
the students is about as great, and the average of each group of 
judges shows very little difference (Table 1). 
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TABLE 1.—RESPONSE EVALUATION BY JUDGES 



































By Experts By Students 
W-B Score 
Range Average Range Average 
per cent per cent per cent per cent 
2 12-38 23 18-31 23 
1 35-67 46 26-42 48 
0 19-46 31 17-43 29 
TABLE 2.—ScorinG DISAGREEMENT BY JUDGES 
Expert Student 
Less than 3% agree Less than 34 agree 
per cent per cent 
2 Theatre 48 56 
3 Company 69 62 
4 Taxes 41 41 
5 Shoes 40 14 
6 Land 32 40 
7 Forest 65 56 
8 Law 31 38 
9 Marriage 48 54 
10 Deaf 74 56 
Av. 52 49 











Table 2 shows the extent of disagreement among the judges in 
scoring each item. ‘Disagreement’ was defined here arbitrarily as 
failure of two-thirds of the judges to agree on scoring an item 2, 
1, or 0. Among the expert judges, agreement was not reached for 
thirty-one per cent (Law) to seventy-four per cent (Deaf) of the 
responses. The student judges did not reach the agreement cri- 
terion for fourteen per cent of the responses (Shoes) to sixty-two 
per cent (Company). For both groups of judges, lack of two-thirds 
agreement occurred about half the time on the average. 

Further analysis of the data indicates that if the criterion of 
three-fourths agreement were established, the judges would have 
failed to agree on sixty-five per cent of the responses; and the per- 
centage of disagreement would increase to eighty-four if seven- 
eighths agreement were required. 

An analysis of the IQ’s of subjects in relation to the difficulty of 
scoring their responses failed to disclose any significant relation- 
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ship, nor was there any one type of response most puzzling to the 
judges. 


DISCUSSION 


It is apparent that the Comprehension Subtest elicits ambiguous 
and hard-to-score responses. The marked similarity of scoring by 
experts and graduate students suggests that experience is not of 
any considerable help in evaluating ambiguous response. Admit- 
tedly, it is easier (or at least it might give the tester a feeling of self- 
confidence) to evaluate the verbal rather than the printed responses 
of the subject. One of the reasons that response analysis would 
seem easier in the actual testing process is that additional question- 
ing may elicit clearer answers. However, the amount of questioning 
permissible may not be clear to either the experienced or inex- 
perienced tester. The manual is not particularly helpful here. 

Certain kinds of responses may be mentioned as causing varia- 
tion in scoring, although these do not account for all the difficulties: 

1) A multiple response may be confusing because each part 
properly should receive a different value. For example, ‘Keeps 
government going, keep highways up and such,” was one to the 
“taxes” item. The first part by itself would be a 2-response accord- 
ing to the manual, and the second part would be a 1-response, but 
together they produce confusion as to the proper score to assign. 

2) Words of similar but slightly different meaning may produce 
confusion in evaluating a response. For example: “Holds better 
than other material,” for the ‘shoes’ question. Would this response 
be less confusing if ‘‘wears’’ were substituted for “cholds’’? 

3) A response not similar to those given in the manual may be 
more confusing than a response similar to those given. For example: 
‘Pick up bad habits,’ in response to the question ‘“Why should we 
keep away from bad company?” Does this mean “become like 
them” as given in the manual? 

The above examples might sound easy, and it might be supposed 
that the correct evaluations would be assigned to them, but the 
judges in this study divided equally between the 2 and 1 values on 
these very responses. 

It is apparent that these ambiguous responses to the Comprehen- 
sion Subtest cause difficulty for testers with varying amounts of 
experience. To aid in alleviating this difficulty, it might be sug- 


gested that: 
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1) The manual for this subtest should contain a wider variety 
of responses, especially those that might be classified on the ‘fringe’ 
of the scoring scale. 

2) There should be more specific directions for evaluating a re- 
sponse which requires questioning by the examiner beyond the 
initial response. Further research on this matter would seem de- 
sirable. 

3) Difficult responses might be recorded and published together 
with the evaluations of competent judges..This might be an aid for 
trainee or experienced tester alike. One such supplementary manual 
is now available (/). 

The need for aids to evaluation will continue, since eight of the 
ten items of this subtest are included in the revision which was 
being standardized at the time this article was written. 
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AN APPLICATION OF FISHER’S DISCRIMI- 
NANT FUNCTION IN THE CLASSIFICATION 
OF STUDENTS 


J. STANLEY AHMANN 


Cornell University 


In a recent article, Johnson (3) briefly described a series of 
statistical methods of relatively recent development which have 
decided promise when applied to educational research. One of 
these techniques mentioned was the discriminant function, intro- 
duced by Fisher (2), which was designed to determine the linear 
combination of numerical measurements which would discriminate 
best the members of one group from those of another group. A 
direct application of this analysis can be found in the activities of 
the counselor or personnel officer who desires to weight several 
characteristics in such a way that maximum discrimination will 
be obtained between groups of students majoring in two different 
curricula, or between groups of applicants eligible for two different 
occupations. 

The usage of the discriminant function in educational research 
has been relatively small, if not nonexistent in many areas. How- 
ever, a symposium at Harvard University (4) reflected considerable 
interest in the application of the technique, and undoubtedly part 
of that interest manifested itself in one of the section meetings held 
at the 1954 Convention of the American Personnel and Guidance 
Association. At this time, Alman (1) reported the usefulness of the 
discriminant function in the classification of college students. He 
determined the similarities and dissimilarities in terms of Kuder 
Interest Inventory scores which existed among major fields of 
study in the College of Liberal Arts at Boston University. 


THE PROBLEM 


Another application of Fisher’s discriminant function was found 
in a problem confronting some of the faculty members serving as 
student advisors at the New York State College of Agriculture at 
Cornell University. Because of the heterogeneous nature of the 
curricula included in this college, a rather widely accepted though 
unofficial classification of these curricula has been used. In the one 
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group, commonly known as the agricultural science group, are 
listed curricula with such majors as bacteriology, biochemistry, 
biology, entomology, agronomy, science teaching, botany, and 
conservation. In a second group, called the general agriculture 
group, are included major areas of concentration such as floricul- 
ture, general farming, poultry, animal husbandry, vegetable farm- 
ing, agricultural economics, fruit farming, and the like. 

Although students are not prevented from transferring from a 
curriculum in one group to one in the other group, there is a notice- 
able reluctance on the part of student advisors to recommend such 
a transfer when a student, who is below average in terms of the 
standardized test results, proposes to change from a general agri- 
culture curriculum to an agricultural science curriculum. Further- 
more, those prospective students who have achieved exceptionally 
well in high school are sometimes encouraged to enter an agricul- 
tural science curriculum rather than a general agriculture curric- 
ulum. 

The problem was then stated as follows: Can students who enter 
agricultural science curricula and successfully continue in those 
curricula be differentiated in terms of available standardized test 
scores from those who enter general agriculture curricula and suc- 
cessfully continue in those curricula? A sample of one hundred and 
twenty-eight students was drawn, twenty-nine of whom entered 
agriculture science curricula and ninety-nine of whom entered gen- 
eral agricultural curricula. All students entered the College of Agri- 
culture in their respective curricula as freshman in the fall term of 
1951. In addition, each continued in his chosen curricula up to and 
including the spring term of 1954, which terminated his junior year. 
All female students and foreign students were excluded from the 
sample. 


THE FINDINGS 


Available for each student as a matter of administrative routine 
were the total raw scores of the Ohio State University Psychological 
Test, the total raw scores of the Cornell General Mathematics 
Test, the speed of comprehension scaled scores of the Coéperative 
Reading Comprehension Test, total scaled scores of the Codpera- 
tive General Achievement Test, Test II: A Test of General Profici- 
ency in the Field of Natural Sciences, and the high school grade- 
point average in terms of per cent. All of these scores were compiled 
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TABLE 1.—SUMMARY OF THE DaTa 














Means 
Point 
Numerical Variable Symbol oe Gen. —s 
eats Students tions 

0.8.U. Psych. Test Xi 78.3 77.1 0.02 0.3 
Cornell Math. Test Xe 37.6 35.7 0.05 0.5 
Coép. Reading Comp. Test Xs 60.5 59.3 0.06 0.7 
Coép. G.A.T. Test II: Profi- | X, 60.8 57.6 0.10 1.1 

ciency in Science 
High School Grade-Point Aver- | Xs 87.7 86.1 0.15 BS 

age 




















during freshman orientation week and were the basis upon which 
discrimination was attempted. 

The identification of the variables and the mean scores of the 
two groups are shown in Table 1. Although the difference was 
usually slight, the agricultural science group consistently surpassed 
the general agriculture group. The point biserial coefficients of 
correlation which represented the relationship between each of the 
numerical variables and the dichotomy were also small. Testing 
the significance of the difference between each of the correlation 
values and zero was determined by the formula given by Wert 
and others (5) 








In each instance, the t-value was nonsignificant at the five per cent 
level, although the t-value of the high school grade-point average 
was greater than that listed for the ten per cent level. 

The foregoing conclusion that differences between the two 
groups could not be demonstrated for each variable individually 
is not equivalent to saying that it is not possible to discriminate 
between the two groups in terms of the five variables taken as a 
group. Because of this and the fact that one group uniformly sur- 
passed the others in terms of the test scores, a discriminant function 
was computed. 

The discriminant function had the general form of: 


V = 8X1 + 2X2 + &3 Xs + Ag Xq + 5 Xp 
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where the x’s are the measurements of the different characteristics 
and the a’s are the weights which will produce the maximum dis- 
tinction between the two groups of students. The values of the a’s 
were found from the simultaneous solution of the following five 
normal equations: 


d; = a; 2 x;? + a2 J xiX2 + As Z xixs + a4 J XiXy + a5 D XiXs5 
do = a 2 xiX2 + a2 D x2 + 3 D XoX3 + a4 D XoX4 + A5 LD X0Xs 
d; = a; 2 xiX3 + 2 Z XoX3 + As D x"s + a4 DS X3X4 + A5 D X95 
dg = a1 D xiXq + Ao D XoXg + Ag D XgXq + Ag D X%q + 5 J XGXs 
ds = a 2 xixs + 2 D XoX5 + ag D XgX5 + ag D X4X5 + As D Xs 


The d-values were the differences between means of each variable, 
whereas the sums of squares and sums of crossproduct were the 
within group deviation values. Substitution of the appropriate 
values yield, upon solution, the discriminant function: 
v = —0.00007x, + 0.00010x. + 0.00012x; + 0.00010x, + 0.00061x; 
To determine whether the two groups of students could be sep- 
arated on the basis of these five variables, the F-test proposed by 
Wert and others (5) was used: 


r= C=E=H EB)» 


where N = total number of cases 
k = number of cases in a group 
m = number of variables 
D = ards + aode + asds + ands + asds 


Solving the equation yielded an F-value of 0.81 which is non- 
significant. 





CONCLUSIONS 


Since the F-value was nonsignificant, it was concluded that in- 
sufficient evidence was found to reject the hypothesis. Thus, suc- 
cessful agricultural science students can not be discriminated 
from successful general agriculture students in terms of the five 
variables used. Therefore, should a student desire to transfer from 
one curriculum group to another, evidence to support or not sup- 
port the application would have to be found from sources other 
than the five variables used in this study. 

Certainly this information is useful to the student advisors in 
the College of Agriculture. It is apparent that additional informa- 
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tion can be made available to counselors and personnel officials 
confronted with comparable problems by the application of the 
discriminant function in the manner illustrated above or in a more 
detailed treatment such as that used by Alman (1). 
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THE EFFECT OF READING ABILITY ON TWO 
STANDARDIZED CLASSIFICATION TESTS 


CECIL J. MULLINS 


Lee College, Baytown, Texas 


Most vocational and educational counselors probably wonder, 
at some time or other, how much timed tests of achievement or 
aptitude are likely to be influenced by the reading skill of the person 
taking the tests. Inasmuch as almost all timed tests of aptitude, 
general mental ability, and achievement require the subject to do 
some reading which is included in the over-all time for the test, it 
seems reasonable to expect reading ability to influence perform- 
ance on the tests. 

This paper intends to make a start toward the investigation of 
the relationship existing between reading ability and timed tests of 
various attributes. On such a huge project, of course, only a start 
can be made by a single paper. This paper, then, confines itself to 
an investigation into the relationship existing between reading abil- 
ity and two widely used classification tests, the ACE (Q-score 
only) and the Coéperative Mechanics of Expression, Form A. 


GROUP STUDIED AND MATERIALS USED 


Group studied——The group studied in this investigation was a 
sample of freshman English students at the University of Houston. 
The correlation between ACE Q-Score and reading speed was ob- 
tained on a sample of four hundred students. The correlation be- 
tween Mechanics of Expression and reading speed was obtained on 
a sample of five hundred and ninety-nine students. 

Materials used.—Reading speed was measured by the speed sec- 
tion of the Coéperative Reading Comprehension Test. The entire 
test was not used because the investigator wanted to discover to 
what extent other measures were affected by the Testee’s speed of 
reading, since the other measures studied were timed measures of 
skill. 

The score on the quantitative section of the ACE was used be- 
cause the investigator believed that if a significant correlation 
could be obtained on this supposed nonlinguistic section, then it 
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should logically follow that at least as high a correlation could be 
assumed for the other section and for the total ACE score. 

The Mechanics of Expression score was investigated as a sample 
of measures of achievement in other areas, which, if completely 
successful, should not be affected significantly by the testee’s speed 
of reading, since the tests are designed to measure proficiency in 
other areas than reading speed. 


TECHNIQUES AND ANALYSIS OF DATA 


Pearson product-moment method of correlation.—Pearson product- 
moment correlations were computed between ACE Q-scores and 
speed of reading scores and between Mechanics of Expression 
scores and speed of reading scores, using the scattergram and 
Garrett’s formula 44.'! This method of correlation yields a correla- 
tion between two variables, assuming that the regression lines are 
linear. This is a very frequent method of correlation used in stand- 
ardizing tests. The standard errors of the correlations were then 
computed by Garrett’s formula 51,? in order to establish the sig- 
nificance of the obtained data. 

The results of the correlations obtained by the above method 

are: 
a) Between Mechanics of Expression and speed of reading a cor- 
relation (r) of .465 was obtained, with a standard error of .032. 
This means that the relationship between these two variables is 
not extremely high, but very significant statistically. 

b) Between ACE Q-score and speed of reading, a correlation (r) 
of .146 was obtained, with a standard error of .049. This correlation 
is barely significant at the .01 level, using this method in its com- 
putation. 

The correlation ratio (eta).—The correlation ratio is a method of 
correlation used for obtaining the degree of correlation between 
two variables which does not assume a linear relationship. As Gar- 
rett points out,’ this method of correlation could be used for ob- 
taining the degree of relationship in any case in which the Pearson 
product-moment method of correlation might be used. Actually, 
the correlation ratio gives us a more exact picture of the degree of 
relationship. If one has a linear distribution on the two variables, 





1 Henry E. Garrett, Statistics in Psychology and Education, p. 287. 
2 Ibid., p. 297. 
* Ibid., p. 367. 
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r and eta will coincide exactly. It is so seldom, however, in dealing 
with human variables, that one obtains an exact linear distribution; 
it is frequently safer, though somewhat more laborious, to compute 
a correlation ratio in preference to the Pearson product-moment. 
Computation of the correlation ratio followed the formula given 
by Garrett on page 370,‘ corrected by his formula 76.° Standard 
errors were computed using his formula 75.® 

Correlation ratios were computed on the same groups mentioned 
above, with the following results: 

a) Between Mechanics of Expression and speed of reading, cor- 
rected etas of .739 (representing the change in Mechanics of Ex- 
pression scores associated with changes in speed of reading scores) 
and .841 (representing the change in speed of reading scores asso- 
ciated with changes in Mechanics of Expression scores) were ob- 
tained, with standard errors of .019 and .012, respectively. This 
paper is particularly interested in the former of these two etas, 
since it concerns itself with the effect of speed of reading on Me- 
chanics of Expression. Using eta, then, which is a more exact rep- 
resentation of relationship between these two variables, we have 
not only a statistically high significance, but also a rather high 
correlation (much higher than one would expect if the Mechanics 
of Expression Test measures mechanics of expression and little 
else). 

b) Between ACE Q-scores and speed of reading, corrected etas 
of .469 (representing the change in ACE Q-score associated with 
changes in speed of reading scores) and .520 (representing the 
change in speed of reading scores associated with changes in ACE 
Q-scores), were computed, with standard errors of .039 and .037, 
respectively. Again, this paper is particularly interested in the 
former, and, again using eta, we have not only what is a significant 
relationship evidenced, but also a fairly high degree of relationship. 


SUMMARY AND CONCLUSIONS 


In our attempts to measure attributes of achievement and 
scholastic aptitude, we must be careful not to include in our meas- 
urement certain variables which are not controlled, such as the 
effect of reading speed on timed tests. This study has indicated 





‘Tbid., Formula (73), p. 370. 
5 Ibid., Formula (76), p. 372. 
® Ibid., Formula (74), p. 371. 
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that a definite and rather high degree of relationship exists between 
speed of reading and two such measures, quantitative scholastic 
aptitude and mechanics of expression. 

A word of caution in regard to interpretation of these data should 
be mentioned. Although a high degree of relationship obviously 
exists between these variables, it does not necessarily mean that 
a causal relationship exists between them. That is to say, although 
a fairly high degree of relationship exists, one cannot on the strength 
of the above investigation say definitely that the relationship is 
brought about by the influence of speed of reading alone. It is 
entirely possible that some or all of the correlation in evidence can 
be traced to a third variable, perhaps general mental ability. The 
correlations obtained are so high it seems hardly likely, but final 
judgment must be withheld until the results of many other such 
investigations are available. 














(Concluded from Inside Front Cover) 
these charges, as well as information regarding the services of the American 
Documentation Institute in making available tables, etc., that for economic 
reasons may be omitted from a printed article, by writing to the Editors or 
the Publishers. 

Double-spacing.—Manuscripts should be typed, written on one side of 
the paper only, and double-spaced throughout including quotations, foot- 
notes, and bibliographical references. 

Footnotes.—Footnotes are to be numbered consecutively beginning with 
‘)’, and should be on a separate sheet at the end of manuscript. (Foot- 
notes to tables carry the *, f, and f.) 

Titles.—Titles of articles should be brief, preferably three to eight words, 
with an extreme maximum of twelve words. 

Type style-——Manuscripts are not to be marked for type style—this is 
done in the editorial office. 





Books and other materials for review, and business communications 
should be addressed to THE JouRNAL oF EpucaTiona. Psycuo.ooy, War- 
wick & York, Publishers, 10 E. Center St., Baltimore 2, Md. 

Subscribers should notify the Publishers of change of address at least 
four weeks in advance of publication of the issue with which change is to 
take effect; both the old and new address should be given. 


The Publisher desires every subscriber to get all issues to which he is 
entitled. Each journal is securely enclosed in a sturdy wrapper on which 
the subscriber’s name and address have been printed, and is delivered 
directly to the Post Office, postage prepaid. Second-class matter is handled 
less by postal employees than other mail; moreover, if the Post Office is 
unable to make delivery, a notice to this effect is sent the Publisher and 
the magazine returned. Consequently, it is doubtful if one journal in many 
thousands is actually lost in transit. 

But after an issue has been delivered to the proper address many things 
may happen to it—it may be diverted, or misplaced, or borrowed and not 
returned. For this neither Post Office nor Publisher is responsible. How- 
ever, a subscriber who does not find a given issue in its assigned place may 
innocently make a claim of non-receipt. No claim for non-receipt of an 
issue can be honored unless made within four weeks after arrival of the 
next succeeding number. In order that a claim may arrive within the time 
limit it should be addressed to the Publisher—not to an agency. 


WARWICK AND YORK Pubdlishers Bartrmore 2, Mp. 





